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BACKGROUND OF THE INVENTION 



The U.S. Government may own rights in the application pursuant to funding from the 
National Institutes of Health (DK49835). 

5 

1 . Field of the Invention 

The present invention relates to the fields of biochemistry, cellular biology and molecular 
biology. More particularly, it relates to the field of protein biochemistry, and specifically, to the 
use of an assay for determining protein folding and solubility. 

10 

2. Description of Related Art 

There are a wide variety of potential applications for a genetic system enabling rapid and 
efficient evaluation of protein solubility characteristics in vivo. One of the cornerstones of 
^ biotechnology is the ability to express target proteins in functional form in vivo in 
Wf genetically-engineered organisms. However, many important target proteins are not efficiently 
ijj expressed in soluble form in bacteria such as E. coli, due at least in part to the complexity of the 
protein folding process in vivo (Houry et aL, 1999). When encountering a target protein that 
□ fails to be expressed in soluble form in vivo, the yield of soluble protein can often be improved 
j jL l by optimizing various factors such as the primary sequence of the target protein (Huang et aL, 
|t) 1996) or the genetic background or growth conditions of the bacterium (Hung et aL, 1998; 
Brown et aL, 1997; Blackwell & Horgan, 1991; Bourot et aL, 2000; Sugihara & Baldwin, 1988; 
Wynn et aL, 1992). However, existing assays for protein expression in soluble form are tedious, 
usually requiring lysis and fractionation of cells followed by protein analysis by 
SDS-polyacrylamide gel electrophoresis. Using this traditional approach, screening for protein 
25 constructs and/or physiological conditions yielding improved solubility is inefficient, and genetic 
selection is impossible. 

Protein folding diseases represent a second area in which protein solubility characteristics 
are of vital medical and technological importance (Thomas et aL, 1995; Dobson, 1999). These 
30 diseases, which have proven particularly refractory to pharmaceutical development, are caused 
either by misfolding of a protein during biosynthesis subsequent to acquiring some mutation 
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(Brown et al, 1997; Thomas et al, 1992; Rao et al, 1994) or by aberrant protein processing 
leading to the formation of an aggregation-prone product, such as the peptide forming the 
amyloid plaques associated with Alzheimer's disease (Tan & Pepys, 1994; Harper & Lansbury, 

1997) , SOD1 in amyotropic lateral sclerosis (Bruijn et al, 1998), a-synuclein in Parkinson's 
disease (Galvin et al, 1983), amyloid A and P deposits in systemic amyloidosis (Hind et al, 
1983), transthyretin fibrils in fatal familial insomnia (Colon & Kelly, 1992) and the intranuclear 
inclusions associated with polyglutamine expansions which cause Huntington's disease (Martin 
& Gusella, 1986; HDCRG, 1993; Davies et al, 1997), spinocerebellar ataxia (Wells & Warren, 

1998) , spinobulbar muscular atrophy (La Spada et al, 1991), and Machado-Joseph Disease 
(Kawaguchi et al, 1994). The ability to rapidly and efficiently screen for protein solubility in 
vivo could also be applied to the development of assays for pharmaceutical compounds 
preventing the misfolding or aggregation of proteins involved in protein folding diseases {i.e., 
assays for compounds that prevent precipitation of such aggregation-prone proteins). 

Thus, there remains a need in the field for improved methods of screening for protein 
folding and solubility. 

SUMMARY OF THE INVENTION 

The present invention involves the use of a genetic system based on structural 
complementation (Richards & Vithayati, 1959; Ullmann et al, 1967; Taniuichi & Anfmsen, 
1971; Zabin & Villarejo, 1975; Pecorari et al, 1993; Schonberger et al, 1996) of a selectable 
marker protein can be used as the basis of a direct in vivo solubility assay. Structural 
complementation involves the division of a protein into two component segments which must be 
combined to form a stable and fully functional structure. The specific implementation of the 
method is an adaptation of the classic oc-complementation system of p-galactosidase (P-gal) 
(Ullmann et al, 1967). However, the same concept could potentially be applied to other 
selectable genetic markers like chloramphenicol transacetylase or even screenable markers like 
the green fluorescent protein (although appropriately complementing fragments of these proteins 
would have to be developed first), p-gal can be divided into two fragments (a and co) capable of 
associating with each other to form an active enzyme (Ullmann et al, 1967). Redistribution of 
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the a-fragment from the soluble to the insoluble fraction in E. coli cells leads to a reduction in 
the level of 0-gal activity which can be assayed either during growth on indicator agar plates 
using the chromogenic substrate X-gal, or in suspension culture. Fusion of the a-fragment to the 
C-terminus of a target protein leads to the formation of a chimeric protein with solubility 
properties similar to that of the target protein alone. Thus, p-gal activity levels report the 
solubility of the target fusion. By contrast, three extant systems for monitoring solubility and 
misfolding in vivo rely on the use of fusions with the full-length maker proteins P-gal (Lee et al, 
1990), GFP (Waldo et al, 1999) and CAT (Maxwell et al, 1999). It is well documented that the 
solubility properties of protein fusions to intact marker enzymes tend to be dominated by the 
solubility properties of the marker enzyme, as evidenced by the use of MBP (Ko et al, 1993; 
Kapust et al, 1999), thioredoxin (Papouchado et al, 1997), and GST (Wang et al, 1999) fusions 
to enhance the solubility of some otherwise insoluble protein constructs. Such a colorimetric 
plate assay should be readily adapted to efficient high-throughput screening. 

Thus, there is provided, a method for assessing protein folding and/or solubility 
comprising (a) providing an expression construct comprising (i) a gene encoding fusion protein, 
said fusion protein comprising a protein of interest fused to a first segment of a marker protein, 
wherein said first segment does not affect the folding or solubility of the protein of interest, and 
(ii) a promoter active in said host cell and operably linked to said gene, (b) expressing said fusion 
protein in a host cell that also expresses a second segment of said marker protein, wherein said 
second segment is capable of structural complementation with said first segment, and (c) 
determining structural complementation, wherein a greater degree of structural complementation, 
as compared to structural complementation observed with appropriate negative controls, 
indicates proper folding and/or solubility of said protein. 

The fusion may be N- or C-terminal to said protein of interest. The marker protein may 
be selected from the group consisting of a target binding protein, an enzyme, a protein inhibitor, 
and a chromophore. Examples include ubiquitin, green fluorescent protein, blue fluorescent 
protein, yellow fluorescent protein, luciferase, aquorin, P-galactosidase, cytochrome c, 
chymotrypsin inhibitor, RNase, phosphoglycerate kinase, invertase, staphylococcal nuclease, 
thioredoxin C, lactose permease, amino acyl tRNA synthase, and dihydrofolate reductase. In the 
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particular case of fi-galactosidase, the first segment is the a-peptide of p-galactosidase, and said 
second segment is the co-peptide of (5-galactosidase. In certain embodiments the marker protein 
is associated with a detectable phenotype, including enzymatic activity, chromophore or 
fluorophore activity. 

5 

The protein of interest may be Alzheimer's amyloid peptide (Ap), SOD1, presenillin 1 
and 2, a-synuclein, amyloid A, amyloid P, CFTR, transthyretin, amylin, lysozyme, gelsoiin, p53, 
rhodopsin, insulin, insulin receptor, fibrillin, a-ketoacid dehydrogenase, collagen, keratin, 
PRNP, immunoglobulin light chain, atrial natriuretic peptide, seminal vesicle exocrine protein, 
10 ^-microglobulin, PrP, precalcitonin, ataxin 1, ataxin 2, ataxin 3, ataxin 6, ataxin 7, huntingtin, 
androgen receptor, CREB-binding protein, dentaorubral pallidoluysian atrophy-associated 
protein, maltose-binding protein, ABC transporter, glutathione S transferase, and thioredoxin. 



, 1 The gene encoding the second segment may be carried on a chromosome of said host cell 

|| or episomally. The host cell may be a bacterial cell, an insect cell, a yeast cell, a nematode cell, 
\M and a mammalian cell. Examples include E, coli., C elegans, or S. fugeria, and a variety of 
^ mammalian cells. Preferred promoters include Tag promoter; T7 promoter, or ? iac promoter 
Q (bacterial), CupADH, Gal (yeast) or PepCk or tk (mammalian). 

|S In particular embodiment, the method utilizes a negative control that is a host cell lacking 

I s * the second segment of said marker protein and/or a fusion protein that is improperly folded 
and/or insoluble. 

In another embodiment, there is provided, a method for screening protein folding and/or 
25 solubility mutants comprising (a) providing a gene encoding fusion protein comprising (i) a 
protein of interest and (ii) a first segment of a marker protein, wherein said first segment does not 
affect the folding or solubility of the protein of interest, , wherein said fusion protein is not 
properly folded and/or soluble when expressed in said host cell, and (ii) a promoter active in said 
host cell and operably linked to said gene, wherein said fusion protein is not properly folded 
30 and/or soluble when expressed in said host cell, (b) mutagenizing that portion of the gene 
encoding said protein of interest, (c) expressing said fusion protein in a host cell that expresses a 
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second segment of said marker protein, wherein said second segment is capable of structural 
complementation with said first segment, and (d) determining structural complementation, 
wherein a relative increase in structural complementation, as compared to the structural 
complementation observed with the unmutagenized fusion protein, indicates an increase in 
proper folding and/or solubility of said protein. 

In yet another embodiment, there is provided a method for screening candidate modulator 
substance that modulates protein folding and/or solubility comprising (a) providing an 
expression construct comprising (i) a gene encoding fusion protein, said fusion protein 
comprising a protein of interest fused to a first segment of a marker protein, wherein said first 
segment does not affect the folding or solubility of the protein of interest, and (ii) a promoter 
active in said host cell and operably linked to said gene, (b) expressing said fusion protein in a 
host cell that expresses a second segment of said marker protein, wherein said second segment is 
capable of structural complementation with said first segment, (c) contacting the host cell with 
said candidate modulator substance; and (d) determining structural complementation, wherein a 
relative change in structural complementation, as compared to the structural complementation 
observed in the absence of said candidate modulator substance, indicates that said candidate 
modulator substance is a modulator of protein folding and/or solubility. The candidate 
modulator substance may be a protein, a nucleic acid or a small molecule. 

Following long-standing patent language convention, the terms "a" or "an," when used in 
conjunction with "comprising," may mean one or more than one, herein the description and 
claims. 

BRIEF DESCRIPTION OF THE DRAWINGS 

The following drawings form part of the present specification and are included to further 
demonstrate certain aspects of the present invention. The invention may be better understood by 
reference to one or more of these drawings in combination with the detailed description of 
specific embodiments presented herein. 



1657123 1 



FIG, 1 A and IB: An in vivo solubility assay based on structural complementation. (FIG. 
1A) A schematic depicting the complementation solubility assay. P (squares) represents the 
target protein, and a (triangles) and to (trapezoids) represent each of the complementing 
fragments of the tetrameric p-galactosidase. Brackets indicate the concentration dependence of 
5 the assay regarding the availability of soluble (folded) target/a fusion. Kd is indicated solely to 
highlight the concentration-dependent equilibrium association/dissociation reaction. (FIG. IB) 
A schematic representation of the target protein/cc-fragment C-terminal fusion expression 
construct (a-fragment, residues 7-58 from full length p-galactosidase). "HA" indicates the 
position of the inserted influenza hemagglutinin (HA) immuno-tag (residue sequence 
10 YPYDVPDYA) present in some of the constructs examined. 

FIG. 2. Correlation of p-galactosidase activity with fusion protein solubility and folding. 
:jj A scatter plot correlating the in vitro P-galactosidase activity measured in cell lysates (see Table 
:Jt 1) with the fraction soluble (open circles) and the reported periplasmic yield (filed squares) for 
|f each of the MBP/oc-fragment fusion proteins examined. 

DESCRIPTION OF ILLUSTRATIVE EMBODIMENTS 

bj Protein misfolding is the basis of a number of human diseases. It also presents a sizable 

obstacle to the production of functional recombinant proteins. In addition, there is a tremendous 

^ potential to modulate in vivo function of proteins by modulating protein folding. To date, the 
study of misfolding and its circumvention has required development of specific assays for each 
individual case. 



25 However, for maximum utility, such a method should provide an easily measured signal, 

be sensitive to subtle changes in the solubility of the target protein over a wide concentration 
range, allow phenotypic selection of the soluble protein, and have minimal effect on the 
solubility of the target protein. The present invention offers each of these advantages. 

30 The present invention utilizes generalized fusion constructs and the phenomenon of 

"structural complementation" to examine protein folding and/or solubility in cell- or organism- 
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based screening. In a particular embodiment, the a and co peptides of (3-galactosidase are used, 
the first as a fusion partner for a given protein of interest, in a complementation assay. Where 
the protein of interest is properly folded, the fusion remains soluble and can associate the other 
peptide of p-galactosidase, permitting enzyme activity and detection, A variety of different host 
cells, "structural complementation" pairs (enzymes, binding proteins, chromophores) and target 
proteins can be used. 

The studies presented herein demonstrate that this system reliably reports on the 
solubility of eight fused target proteins: the maltose binding protein and mutants thereof, the 
first nucleotide binding domains of the cystic fibrosis transmembrane conductance regulator and 
the branched chain amino acid transporter from the hyperthermophilic archeon methanococcus 
jannaschiU and the Ap peptide of the Alzheimer's precursor protein. The fact that the signal 
produced by the fusions is proportional to the solubility of the nucleotide binding domain targets 
when expressed without the a-fragment indicates that this relatively small polypeptide does not 
significantly effect the solubility of the target protein, unlike fusions to a larger marker protein 
(e.g., M13P, Harper and Lansbury, 1997) . This could provide a significant advantage over two 
recently reported solubility monitoring systems that rely on fusions with larger soluble proteins, 
namely full length p-gal (Lee et aL, 1990), GFP (Waldo et aL, 1999) and CAT (Maxwell et ai 9 
1999). It is well-documented that fusions with highly soluble proteins such as GST (Wang et al, 
1999), MBP (Ko et al 9 1993), and thioredoxin (Papouchado et ai, 1997), and the 
immunoglobulin binding domain (GB1) (Huth et al 9 1997) significantly improve the solubility 
properties of a variety of expressed proteins. Thus, it is reasonable to expect that in some cases, 
GFP and CAT may have a significant effect on the solubility of the target. 

As mentioned above, this system has several potential uses. For example, recombinant 
production systems can be tested to determine if the polypeptide to be produced is properly 
folded. In addition, target proteins may be diagnostic of disease states. The system also could 
find utility in the development and selection of bacterial strains particularly effective at 
expressing and folding heterologous proteins, or for phenotypic selection of a wide variety of 
proteins in their study by random mutagenesis. These powerful approaches currently are limited 
to proteins which themselves are required for a measurable cellular function. Thus, the present 
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solubility detection system provides an important avenue for understanding fundamental 
biological processes such as how primary sequence directs the formation of a unique 
three-dimensional structure, or the identity and mechanisms of cellular systems important for 
efficient protein maturation. 

5 

One aspect of the invention is the minimal impact of the fusion partners on the protein of 
interest. The presence of only "systematic" effects (i.e., similar both in the presence and absence 
of either drug or mutation) on the solubility of the target permits ready comparison. This 
actually provides the added advantage of beign able to adjust the sensitivity of the assay 
10 depending on the target protein of interest. Recent discovery of mutations in the a subunit 
permit "tuning" of the a - G interaction which also can be used for altering the sensitivity. 

l % Perhaps the most exciting application of the system is the discovery of drugs which 

'' Hs 4 modulate the folding of disease related proteins. Previously, the search for pharmaceuticals has 
lj focused on the identification of compounds which inhibit cellular processes. However, the 
; i increasing prevalence of diseases associated with protein misfolding such as Huntington's 
disease, Alzheimer's disease, Parkinson's disease, cystic fibrosis, amyotropic lateral schlerosis, 
P Creutzfeld-Jacob disease, and some forms of diabetes and cancer presents a new challenge for 
the pharmaceutical industry. The identification of drugs which target proteins with a propensity 
2© to misfold requires the development of novel screening and assay methodologies such as the 
a-complementation system described herein. Encouraging evidence that such pharmaceuticals 
may be identified has recently been provided by Rastinejad and co-workers (Foster et ai 9 1999) 
who reported the identification of a class of compounds which stabilized a folding mutant of p53 
in a soluble and functional conformation, thereby rescuing its ability to prevent tumor growth in 
25 mice. 

Various aspects of the invention are described, in greater detail, in the following pages. 

A, Protein Folding and Mutant Proteins 

30 Several diseases, such as Alzheimer's disease, Parkinson's disease, Huntington's disease, 

and others are thought to be the result of, or associated with misfolding in vivo. In certain 
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embodiments, the present invention provides a method of assaying for the presence of protein 
misfolding in a living cell 

Proteins expressed through recombinant means often misfold, particularly in prokaryotic 
5 host cells that lack the processing machinery of an eukaryotic cell. When a protein misfolds, it 
often becomes less soluble, and may precipitate in the cell as an inclusion body. Additionally, 
mutations in naturally occurring proteins increase the rate of misfolding when endogenously 
expressed, as well as when exogenously expressed in a recombinant host cell. In certain 
embodiments, the present invention allows various mutations, whether natural or produced by 
10 the hand of man, to be assayed for their ability to increase or decrease protein misfolding in vivo. 

1. Fusion Proteins 

G An aspect of the present invention is the discovery that peptides, polypeptides or proteins, 

sj useful for alpha complementation, may be joined to a larger soluble protein, polypeptide or 
IS peptide, wherein the folding reaction is dominated by the soluble protein, polypeptide or peptide. 
□ The soluble protein, peptide or polypeptide may have the same length or amino acid sequence as 
^ the endogenously produced protein, polypeptide or peptide. In other embodiments, the soluble 
% s protein, peptide or polypeptide may be a truncated protein, protein domain or protein fragment of 
a larger peptide chain. For example, the folding of the soluble fragments of a membrane 
2p| embedded or otherwise hydrophobic protein may be used to create a fusion protein. 

Fusion proteins are produced by operatively linking at least one nucleic acid encoding at 
least one amino acid sequence to at least a second nucleic acid encoding at least a second amino 
acid sequence, so that the encoded sequences are translated as a contiguous amino acid sequence 
25 either in vitro or in vivo. Fusion protein design and expression is well known in the art, and 
methods of fusion protein expression are described herein, and in references, such as, for 
example, U.S. Patent 5,935,824, incorporated herein by reference. 

In certain embodiments, a peptide, polypeptide or protein may be joined at or near the N- 
30 terminal or C-terminal end of a soluble protein, peptide or polypeptide. In certain embodiments, 
it is contemplated that the alpha complementing peptide or polypeptide may be attached to the 
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soluble protein, peptide or polypeptide via a linker moiety. One such linker is another peptide, 
such as described in U.S. Patent 5,990,275, incorporated herein by reference. 

2. Mutagenesis 

5 Where employed, mutagenesis will be accomplished by a variety of standard, mutagenic 

procedures. Mutation is the process whereby changes occur in the quantity or structure of an 
organism. Mutation can involve modification of the nucleotide sequence of a single gene, blocks 
of genes or whole chromosome. Changes in single genes may be the consequence of point 
mutations which involve the removal, addition or substitution of a single nucleotide base within 
10 a DNA sequence, or they may be the consequence of changes involving the insertion or deletion 
of large numbers of nucleotides. 

"| Mutations can arise spontaneously as a result of events such as errors in the fidelity of 

^ DNA replication or the movement of transposable genetic elements (transposons) within the 
IM genome. They also are induced following exposure to chemical or physical mutagens. Such 
j S jJ mutation-inducing agents include ionizing radiations, ultraviolet light and a diverse array of 

chemical such as alkylating agents and polycyclic aromatic hydrocarbons all of which are 
Q capable of interacting either directly or indirectly (generally following some metabolic 

biotransformations) with nucleic acids. The DNA lesions induced by such environmental agents 
i# may lead to modifications of base sequence when the affected DNA is replicated or repaired and 
u; thus to a mutation. Mutation also can be site-directed through the use of particular targeting 

methods. 

a. Random Mutagenesis 

25 i) Insertional Mutagenesis 

Insertional mutagenesis is based on the inactivation of a gene via insertion of a known 
DNA fragment. Because it involves the insertion of some type of DNA fragment, the mutations 
generated are generally loss-of-function, rather than gain-of-function mutations. However, there 
are several examples of insertions generating gain-of-function mutations (Oppenheimer et al 

30 1991). Insertion mutagenesis has been very successful in bacteria and Drosophila (Cooley et al 
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1988) and recently has become a powerful tool in corn (Schmidt et al. 1987); Arabidopsis; 
(Marks et al, 1991; Koncz et al 1990); and Antirrhinum (Sommer et al 1990). 

Transposable genetic elements are DNA sequences that can move (transpose) from one 
5 place to another in the genome of a cell. The first transposable elements to be recognized were 
the Activator/Dissociation elements of Zea mays. Since then, they have been identified in a wide 
range of organisms, both prokaryotic and eukaryotic. 

Transposable elements in the genome are characterized by being flanked by direct repeats 
10 of a short sequence of DNA that has been duplicated during transposition and is called a target 
site duplication. Virtually all transposable elements whatever their type, and mechanism of 
transposition, make such duplications at the site of their insertion. In some cases the number of 
"5 bases duplicated is constant , in other cases it may vary with each transposition event. Most 
J transposable elements have inverted repeat sequences at their termini, these terminal inverted 
Ifl repeats may be anything from a few bases to a few hundred bases long and in many cases they 
7% are known to be necessary for transposition. 

q Prokaryotic transposable elements have been most studied in E. coli and Gram negative 

; 3 1 bactena, but also are present m Gram positive bacteria. They are generally termed insertion 
2f sequences if they are less than about 2 kB long, or transposons if they are longer, 
u Bacteriophages such as mu and D108, which replicate by transposition, make up a third type of 
transposable element, elements of each type encode at least one polypeptide a transposase, 
required for their own transposition. Transposons often further include genes coding for function 
unrelated to transposition, for example, antibiotic resistance genes. 

25 

Transposons can be divided into two classes according to their structure. First, 
compound or composite transposons have copies of an insertion sequence element at each end, 
usually in an inverted orientation. These transposons require transposases encoded by one of 
their terminal IS elements. The second class of transposon have terminal repeats of about 30 
30 base pairs and do not contain sequences from IS elements. 
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Transposition usually is either conservative or replicative, although in some cases it can 
be both. In replicative transposition, one copy of the transposing element remains at the donor 
site, and another is inserted at the target site. In conservative transposition, the transposing 
element is excised from one site and inserted at another. 

Eukaryotic elements also can be classified according to their structure and mechanism of 
transportation. The primary distinction is between elements that transpose via an RNA 
intermediate, and elements that transpose directly from DNA to DNA. 

Elements that transpose via an RNA intermediate often are referred to as 
retrotransposons, and their most characteristic feature is that they encode polypeptides that are 
believed to have reverse transcriptionase activity. There are two types of retrotransposon. Some 
resemble the integrated proviral DNA of a retrovirus in that they have long direct repeat 
sequences, long terminal repeats (LTRs), at each end. The similarity between these 
retrotransposons and proviruses extends to their coding capacity. They contain sequences related 
to the gag and pol genes of a retrovirus, suggesting that they transpose by a mechanism related to 
a retroviral life cycle. Retrotransposons of the second type have no terminal repeats. They also 
code for gag- and pol-like polypeptides and transpose by reverse transcription of RNA 
intermediates, but do so by a mechanism that differs from that or retrovirus-like elements. 
Transposition by reverse transcription is a replicative process and does not require excision of an 
element from a donor site. 

Transposable elements are an important source of spontaneous mutations, and have 
influenced the ways in which genes and genomes have evolved. They can inactivate genes by 
inserting within them, and can cause gross chromosomal rearrangements either directly, through 
the activity of their transposases, or indirectly, as a result of recombination between copies of an 
element scattered around the genome. Transposable elements that excise often do so imprecisely 
and may produce alleles coding for altered gene products if the number of bases added or deleted 
is a multiple of three. 
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Transposable elements themselves may evolve in unusual ways. If they were inherited 
like other DNA sequences, then copies of an element in one species would be more like copies in 
closely related species than copies in more distant species. This is not always the case, 
suggesting that transposable elements are occasionally transmitted horizontally from one species 
5 to another. 

ii) Chemical mutagenesis 
Chemical mutagenesis offers certain advantages, such as the ability to find a full range of 
mutant alleles with degrees of phenotypic severity, and is facile and inexpensive to perform. The 
10 majority of chemical carcinogens produce mutations in DNA. Benzo[a]pyrene, N-acetoxy-2- 
acetyl aminofluorene and aflotoxin Bl cause GC to TA transversions in bacteria and mammalian 
cells. Benzo[a]pyrene also can produce base substitutions such as AT to TA. N-nitroso 
□ compounds produce GC to AT transitions. Alkylation of the 04 position of thymine induced by 
y exposure to n-nitrosoureas results in TA to CG transitions. 
ii 

Q A high correlation between mutagenicity and carcinogenity is the underlying assumption 

behind the Ames test (McCann et al 9 1975) which speedily assays for mutants in a bacterial 
system, together with an added rat liver homogenate, which contains the microsomal cytochrome 

M P450, to provide the metabolic activation of the mutagens where needed. 

H I n vertebrates, several carcinogens have been found to produce mutation in the ras proto- 

oncogene. N-nitroso-N-methyl urea induces mammary, prostate and other carcinomas in rats 
with the majority of the tumors showing a G to A transition at the second position in codon 12 of 
the Ha-ras oncogene. Benzo[a]pyrene-induced skin tumors contain A to T transformation in the 

25 second codon of the Ha-ras gene. 

Hi) Radiation Mutagenesis 
The integrity of biological molecules is degraded by the ionizing radiation. Adsorption 
of the incident energy leads to the formation of ions and free radicals, and breakage of some 
30 covalent bonds. Susceptibility to radiation damage appears quite variable between molecules, 
and between different crystalline forms of the same molecule. It depends on the total 
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accumulated dose, and also on the dose rate (as once free radicals are present, the molecular 
damage they cause depends on their natural diffusion rate and thus upon real time). Damage is 
reduced and controlled by making the sample as cold as possible. 

5 Ionizing radiation causes DNA damage and cell killing, generally proportional to the 

dose rate. Ionizing radiation has been postulated to induce multiple biological effects by direct 
interaction with DNA, or through the formation of free radical species leading to DNA damage. 
These effects include gene mutations, malignant transformation, and cell killing. Although 
ionizing radiation has been demonstrated to induce expression of certain DNA repair genes in 
10 some prokaryotic and lower eukaryotic cells, little is known about the effects of ionizing 
radiation on the regulation of mammalian gene expression (Borek, 1985). Several studies have 
described changes in the pattern of protein synthesis observed after irradiation of mammalian 
: J cells. For example, ionizing radiation treatment of human malignant melanoma cells is 
"4 associated with induction of several unidentified proteins (Boothman et al, 1989). Synthesis of 
1;£ cyclin and co-regulated polypeptides is suppressed by ionizing radiation in rat REF52 cells, but 
% not in oncogene-transformed REF52 cell lines (Lambert and Borek, 1988). Other studies have 
; demonstrated that certain growth factors or cytokines may be involved in x-ray-induced DNA 
L 3 damage. In this regard, platelet-derived growth factor is released from endothelial cells after 
; " irradiation (Witte, et aL, 1989). 
2& 

-.1 In the present invention, the term "ionizing radiation" means radiation comprising 

particles or photons that have sufficient energy or can produce sufficient energy via nuclear 
interactions to produce ionization (gain or loss of electrons). An exemplary and preferred 
ionizing radiation is an x-radiation. The amount of ionizing radiation needed in a given cell 
25 generally depends upon the nature of that cell. Typically, an effective expression-inducing dose 
is less than a dose of ionizing radiation that causes cell damage or death directly. Means for 
determining an effective amount of radiation are well known in the art. 

In a certain embodiments, an effective expression inducing amount is from about 2 to 
30 about 30 Gray (Gy) administered at a rate of from about 0.5 to about 2 Gy/minute. Even more 
preferably, an effective expression inducing amount of ionizing radiation is from about 5 to 
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about 15 Gy. In other embodiments, doses of 2-9 Gy are used in single doses. An effective dose 
of ionizing radiation may be from 10 to 100 Gy, with 15 to 75 Gy being preferred, and 20 to 50 
Gy being more preferred. 

5 Any suitable means for delivering radiation to a tissue may be employed in the present 

invention in addition to external means. For example, radiation may be delivered by first 
providing a radiolabeled antibody that immunoreacts with an antigen of the tumor, followed by 
delivering an effective amount of the radiolabeled antibody to the tumor. In addition, 
radioisotopes may be used to deliver ionizing radiation to a tissue or cell. 

10 

iv) In Vitro Scanning Mutagenesis 
Random mutagenesis also may be introduced using error prone PCR (Cadwell and Joyce, 
% 1992). The rate of mutagenesis may be increased by performing PCR in multiple tubes with 
y s dilutions of templates. 
1* 

One particularly useful mutagenesis technique is alanine scanning mutagenesis in which 
a number of residues are substituted individually with the amino acid alanine so that the effects 
O of losing side-chain interactions can be determined, while minimizing the risk of large-scale 
; s | perturbations in protein conformation (Cunningham et al , 1 989). 
2^ 

In recent years, techniques for estimating the equilibrium constant for ligand binding 
using minuscule amounts of protein have been developed (Blackburn et al, 1991; U.S. Patents 
5,221,605 and 5,238,808). The ability to perform functional assays with small amounts of 
material can be exploited to develop highly efficient, in vitro methodologies for the saturation 

25 mutagenesis of antibodies. The inventors bypassed cloning steps by combining PCR mutagenesis 
with coupled in vitro transcription/translation for the high throughput generation of protein 
mutants. Here, the PCR products are used directly as the template for the in vitro 
transcription/translation of the mutant single chain antibodies. Because of the high efficiency 
with which all 19 amino acid substitutions can be generated and analyzed in this way, it is now 

30 possible to perform saturation mutagenesis on numerous residues of interest, a process that can 
be described as in vitro scanning saturation mutagenesis (Burks et al, 1997). 

1657123.1 

46- 



In vitro scanning saturation mutagenesis provides a rapid method for obtaining a large 
amount of structure- function information including: (i) identification of residues that modulate 
ligand binding specificity, (ii) a better understanding of ligand binding based on the 
identification of those amino acids that retain activity and those that abolish activity at a given 
location, (iii) an evaluation of the overall plasticity of an active site or protein subdomain, (iv) 
identification of amino acid substitutions that result in increased binding. 

v) Random Mutagenesis by Fragmentation and Reassmbly 
A method for generating libraries of displayed polypeptides is described in U.S. Patent 
5,380,721. The method comprises obtaining polynucleotide library members, pooling and 
fragmenting the polynucleotides, and reforming fragments therefrom, performing PCR 
amplification, thereby homologously recombining the fragments to form a shuffled pool of 
recombined polynucleotides. 

b. Site-Directed Mutagenesis 

Structure-guided site-specific mutagenesis represents a powerful tool for the dissection 
and engineering of protein-ligand interactions. The technique provides for the preparation and 
testing of sequence variants by introducing one or more nucleotide sequence changes into a 
selected DNA. 

Site-specific mutagenesis uses specific oligonucleotide sequences which encode the DNA 
sequence of the desired mutation, as well as a sufficient number of adjacent, unmodified 
nucleotides. In this way, a primer sequence is provided with sufficient size and complexity to 
form a stable duplex on both sides of the deletion junction being traversed. A primer of about 17 
to 25 nucleotides in length is preferred, with about 5 to 10 residues on both sides of the junction 
of the sequence being altered. 

The technique typically employs a bacteriophage vector that exists in both a single- 
stranded and double-stranded form. Vectors useful in site-directed mutagenesis include vectors 
such as the Ml 3 phage. These phage vectors are commercially available and their use is 
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generally well known to those skilled in the art. Double-stranded plasmids are also routinely 
employed in site-directed mutagenesis, which eliminates the step of transferring the gene of 
interest from a phage to a plasmid. 

5 In general, one first obtains a single-stranded vector, or melts two strands of a double- 

stranded vector, which includes within its sequence a DNA sequence encoding the desired 
protein or genetic element. An oligonucleotide primer bearing the desired mutated sequence, 
synthetically prepared, is then annealed with the single-stranded DNA preparation, taking into 
account the degree of mismatch when selecting hybridization conditions. The hybridized 

10 product is subjected to DNA polymerizing enzymes such as E. coli polymerase I (Klenow 
fragment) in order to complete the synthesis of the mutation-bearing strand. Thus, a 
heteroduplex is formed, wherein one strand encodes the original non-mutated sequence, and the 
second strand bears the desired mutation. This heteroduplex vector is then used to transform 

Nl appropriate host cells, such as E. coli cells, and clones are selected that include recombinant 

l|j vectors bearing the mutated sequence arrangement. 

Comprehensive information on the functional significance and information content of a 
f i given residue of protein can best be obtained by saturation mutagenesis in which all 1 9 amino 
acid substitutions are examined. The shortcoming of this approach is that the logistics of 
2ft multiresidue saturation mutagenesis are daunting (Warren et al. 9 1996, Zeng et aL, 1996;Yelton 
12 et 1995; Hilton et aL, 1996). Hundreds, and possibly even thousands, of site specific mutants 
must be studied. However, improved techniques make production and rapid screening of 
mutants much more straightforward. See also, U.S. Patents 5,798,208 and 5,830,650, for a 
description of "walk-through" mutagenesis. 

25 

Other methods of site-directed mutagenesis are disclosed in U.S. Patents 5,220,007; 
5,284,760; 5,354,670; 5,366,878; 5,389,514; 5,635,377; and 5,789,166. 
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B. Protein Expression 
1. Vectors 

Once the soluble protein, polypeptide or peptide encoding sequence(s) and alpha 
complementing protein, polypeptide or peptide encoding sequence(s) are selected, they may be 
5 operatively expressed in a recombinant vector. The expression may be in vivo or in vitro, to 
assay the refolding and complementation process. The term "vector" is used to refer to a carrier 
nucleic acid molecule into which a nucleic acid sequence can be inserted for introduction into a 
cell where it can be replicated. A nucleic acid sequence can be "exogenous," which means that it 
is foreign to the cell into which the vector is being introduced or that the sequence is homologous 
10 to a sequence in the cell but in a position within the host cell nucleic acid in which the sequence 
is ordinarily not found. Vectors include plasmids, cosmids, viruses (bacteriophage, animal 
viruses, and plant viruses), and artificial chromosomes (e.g., YACs). One of skill in the art 
5 would be well equipped to construct a vector through standard recombinant techniques, which 
,1 are described in Sambrook et ah, 1989 and Ausubel et al, 1994, both incorporated herein by 
rfr reference. 

The term "expression vector" refers to a vector containing a nucleic acid sequence coding 
; J for at least part of a gene product capable of being transcribed. In some cases, RNA molecules 
; j are then translated into a protein, polypeptide, or peptide. In other cases, these sequences are not 
2CC translated, for example, in the production of antisense molecules or ribozymes. Expression 
!** vectors can contain a variety of "control sequences," which refer to nucleic acid sequences 
necessary for the transcription and possibly translation of an operably linked coding sequence in 
a particular host organism. In addition to control sequences that govern transcription and 
translation, vectors and expression vectors may contain nucleic acid sequences that serve other 
25 functions as well and are described infra. 

a. Promoters and Enhancers 

A "promoter" is a control sequence that is a region of a nucleic acid sequence at which 
initiation and rate of transcription are controlled. It may contain genetic elements at which 
30 regulatory proteins and molecules may bind such as RNA polymerase and other transcription 
factors. The phrases "operatively positioned," "operatively linked," "under control," and "under 
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transcriptional control" mean that a promoter is in a correct functional location and/or orientation 
in relation to a nucleic acid sequence to control transcriptional initiation and/or expression of that 
sequence. A promoter may or may not be used in conjunction with an "enhancer," which refers 
to a cis-acting regulatory sequence involved in the transcriptional activation of a nucleic acid 
5 sequence. 

A promoter may be one naturally associated with a gene or sequence, as may be obtained 
by isolating the 5' non-coding sequences located upstream of the coding segment and/or exon. 
Such a promoter can be referred to as "endogenous." Similarly, an enhancer may be one 
10 naturally associated with a nucleic acid sequence, located either downstream or upstream of that 
sequence. Alternatively, certain advantages will be gained by positioning the coding nucleic 
acid segment under the control of a recombinant or heterologous promoter, which refers to a 
promoter that is not normally associated with a nucleic acid sequence in its natural environment. 
H A recombinant or heterologous enhancer refers also to an enhancer not normally associated with 
llSi a nucleic acid sequence in its natural environment. Such promoters or enhancers may include 
;:f promoters or enhancers of other genes, and promoters or enhancers isolated from any other 
M prokaryotic, viral, or eukaryotic cell, and promoters or enhancers not "naturally occurring," Le., 
q containing different elements of different transcriptional regulatory regions, and/or mutations 
that alter expression. In addition to producing nucleic acid sequences of promoters and 
26* enhancers synthetically, sequences may be produced using recombinant cloning and/or nucleic 
2 acid amplification technology, including PCR™, in connection with the compositions disclosed 
herein (see U.S. Patent 4,683,202, U.S. Patent 5,928,906, each incorporated herein by reference). 
Furthermore, it is contemplated the control sequences that direct transcription and/or expression 
of sequences within non-nuclear organelles such as mitochondria, chloroplasts, and the like, can 
25 be employed as well. 

Naturally, it will be important to employ a promoter and/or enhancer that effectively 
directs the expression of the DNA segment in the cell type, organelle, and organism chosen for 
expression. Those of skill in the art of molecular biology generally know the use of promoters, 
30 enhancers, and cell type combinations for protein expression, for example, see Sambrook et al 
(1989), incorporated herein by reference. The promoters employed may be constitutive, tissue- 
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specific, inducible, and/or useful under the appropriate conditions to direct high level expression 
of the introduced DNA segment, such as is advantageous in the large-scale production of 
recombinant proteins and/or peptides. The promoter may be heterologous or endogenous. 

Tables 1 lists several elements/promoters that may be employed, in the context of the 
present invention, to regulate the expression of a gene. This list is not intended to be exhaustive 
of all the possible elements involved in the promotion of expression but, merely, to be exemplary 
thereof. Table 2 provides examples of inducible elements, which are regions of a nucleic acid 
sequence that can be activated in response to a specific stimulus. 





TABLE 1 


Promoter and/or Enhancer 


r romo ier/ jjjinancer 


References 


Immunoglobulin Heavy Chain 


rsanerji etai., lyoj, Lrines etai., lyoo, Lrrosscnedi et at., 
1985; Atchinson et al, 1986, 1987; Imler etai, 1987; 
Weinberger etai, 1984; Kiledjian etai, 1988; Porton 
etai; 1990 


Immunoglobulin Light Chain 


Queen et al, 1983; Picard et al, 1984 


T-Cell Receptor 


Luria etai, 1987; Winoto etai, 1989; Redondo etai; 
1990 


HLA DQ a and/or DQ p 


Sullivan et al, 1987 


p-Interferon 


Goodbourn etai, 1986; Fujita etai, 1987; Goodbourn 
etai, 1988 


Interleukin-2 


Greene et al, 1989 


Interleukin-2 Receptor 


Greene et al, 1989; Lin et al, 1990 


MHC Class II 5 


Koch et al, 1989 


MHC Class II HLA-Dra 


Sherman et al, 1989 


p-Actin 


Kawamoto et al, 1988; Ng et al; 1989 


Muscle Creatine Kinase (MCK) 


Jaynes etai, 1988; Horlick etai, 1989; Johnson etai, 
1989 


Prealbumin (Transthyretin) 


Costal al, 1988 


Elastase I 


Ornitze* al, 1987 


Metallothionein (MTII) 


K.zxmetal, 1987; Culotta^a/., 1989 
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TABLE 1 


Promoter and/or Enhancer 


Promoter/Enhancer 


References 


Collagenase 


Pinkert et al, 1987; Angel et al, 1987 


Albumin 


Pinkert etal, 1987; Tronche et al, 1989, 1990 


oc-Fetoprotein 


Godbout et al, 1988; Campere et al, 1989 


t-Globin 


Bodine et al, 1987; Perez-Stable et al, 1990 


p-Globin 


Trudelera/., 1987 


c-fos 


Cohens al, 1987 


c-HA-ras 


Triesman, 1986; Deschamps et al, 1985 


Insulin 


Edlund etal, 1985 


Neural Cell Adhesion Molecule 
(NCAM) 


Hirshetf a/., 1990 


ai-Antitrypain 


Latimer^ al, 1990 


H2B (TH2B) Histone 


Hwang etal, 1990 


Mouse and/or Type I Collagen 


Ripe et al, 1989 


Glucose-Regulated Proteins 
(GRP94 and GRP78) 


Chang etal, 1989 


Rat Growth TTormone 


Larsen et al 1 986 


Human Serum Amyloid A (SAA) 


Edbrooke etal, 1989 


Troponin I (TN I) 


Yutzey etal, 1989 


Platelet-Derived Growth Factor 


Pech etal, 1989 


(PDGF) 




Duchenne Muscular Dystrophy 


Klamut etal, 1990 


SV40 


Banerji etal, 1981; Moreau etal, 1981; Sleigh et al, 
1985; Firak et al, 1986; Herr et al., 1986; Imbra et al, 
1986; Kadesch et al, 1986; Wang et al, 1986; Ondek 
et al, 1987; Kuhl et al, 1987; Schaffher et al, 1988 


Polyoma 


Swartzendruber etal, 1975; Vasseur etal, 1980; Katinka 
etal, 1980, 1981; Tyndell efa/., 1981; Dandolo etal, 
1983; de Villiers etal, 1984; Hen e£a/., 1986; Satake 
etal, 1988; Campbell and/or Villarreal, 1988 
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TABLE 1 

Promoter and/or Enhancer 


Promoter/Enhancer 


References 


Retroviruses 


Kriegler et al, 1982, 1983; Levinson et al, 1982; Kriegler 
etal, 1983, 1984a, b, 1988; Bosze etal., 1986; Miksicek 
etal, 1986; Celander etal, 1987; Thiesen etal, 1988; 
Celander etal, 1988; Choi etal, 1988; Reisman etal, 
1989 


Papilloma Virus 


Campo etal, 1983; Lusky etal, 1983; Spandidos and/or 
Wilkie, 1983; Spalholz etal, 1985; Lusky etal, 1986; 
Cripe etal, 1987; Gloss etal, 1987; Hirochika etal, 
1987; Stephens et al, 1987; Glue et al, 1988 


Hepatitis B Virus 


Bulla etal, 1986; Jameel etal, 1986; Shaul etal, 1987; 
Spandau et al, 1988; Vannice et al, 1988 


Human Immunodeficiency Virus 


iviiu/Miig et at., iyo / , nauDer etai., lyoo, JaJcobovits 
era/., 1988; Feng era/., 1988; Takebe etal, 1988; Rosen 
etal, 1988; Berkhout era/., 1989; Laspia etal, 1989; 
Sharp ef al, 1989; Braddock er a/., 1989 


Cytomegalovirus (CMV) 


Weber etal, 1984; Boshart etal, 1985; Foecking era/., 
1986 


Gibbon Ape Leukemia Virus 


Holbrook et al, 1987; Quinn et al, 1989 



TABLE 2 

Inducible Elements 


Element 


Inducer 


References 


MTII 


Phorbol Ester (TFA) 
Heavy metals 


Palmiter etal, 1982; Haslinger 
etal, 1985; Searle etal, 1985; 
Stuart etal, 1985; Imagawa 
etal, 1987, Karin etal, 1987; 
Angel etal, 1987b; McNeall 
etal, 1989 


MMTV (mouse mammary 
tumor virus) 


Glucocorticoids 


Huang etal, 1981; Lee etal, 
1981; Majors etal, 1983; 
Chandler era/., 1983; Lee etal, 
1984; Ponta era/., 1985; Sakai 
etal, 1988 


(3-Interferon 


Poly(rI)x 
Poly(rc) 


Tavernier etal, 1983 
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TABLE 2 

Inducible Elements 


Element 


Inducer 


References 


Adenovirus 5 E2 


E1A 


Imperiale etal, 1984 


Collagenase 


Phorbol Ester (TP A) 


Angel et al, 1987a 


Stromelysin 


Phorbol Ester (TPA) 


Angeled/., 1987b 


SV40 


Phorbol Ester (TPA) 


Angela a/., 1987b 


Munne MX Gene 


Interferon, Newcastle 
Disease Virus 


Hug etal, 1988 


GRP78 Gene 


A23187 


Resendez etal, 1988 


a-2-Macroglobulin 


IL-6 


Kunze* al, 1989 


Vimentin 


Serum 


Rittling^a/., 1989 


MHC Class I Gene H-2Kb 


Interferon 


Blanar et al, 1989 




E1A, SV40 Large T 
Antigen 


Taylor et al, 1989, 1990a, 1990b 


Proliferin 


Phorbol Ester-TPA 


Mordacq et al, 1989 


Tumor Necrosis Factor 


PMA 


Hensele/a/., 1989 


Thyroid Stimulating 
Hormone a Gene 


Thyroid Hormone 


Chatterjeee/a/., 1989 



The identity of tissue-specific promoters or elements, as well as assays to characterize 
their activity, is well known to those of skill in the art. Examples of such regions include the 
human LIMK2 gene (Nomoto et al 1999), the somatostatin receptor 2 gene (Kraus et al, 1998), 
murine epididymal retinoic acid-binding gene (Lareyre et al, 1999), human CD4 (Zhao-Emonet 
et al, 1998), mouse alpha2 (XI) collagen (Tsumaki, et al, 1998), D1A dopamine receptor gene 
(Lee, et al, 1997), insulin-like growth factor H (Wu et al, 1997), human platelet endothelial cell 
adhesion molecule- 1 (Almendro et al, 1996). 



10 b. Initiation Signals and Internal Ribosome Binding Sites 

A specific initiation signal also may be required for efficient translation of coding 
sequences. These signals include the ATG initiation codon or adjacent sequences. Exogenous 
translational control signals, including the ATG initiation codon, may need to be provided. One 
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of ordinary skill in the art would readily be capable of determining this and providing the 
necessary signals. It is well known that the initiation codon must be "in-frame" with the reading 
frame of the desired coding sequence to ensure translation of the entire insert. The exogenous 
translational control signals and initiation codons can be either natural or synthetic. The 
5 efficiency of expression may be enhanced by the inclusion of appropriate transcription enhancer 
elements. 

In certain embodiments of the invention, the use of internal ribosome entry sites (IRES) 
elements are used to create multigene, or polycistronic, messages. IRES elements are able to 
10 bypass the ribosome scanning model of 5' methylated Cap dependent translation and begin 
translation at internal sites (Pelletier and Sonenberg, 1988). IRES elements from two members 
of the picornavirus family (polio and encephalomyocarditis) have been described (Pelletier and 
^ Sonenberg, 1988), as well an IRES from a mammalian message (Macejak and Sarnow, 1991). 

IRES elements can be linked to heterologous open reading frames. Multiple open reading 
lif ? frames can be transcribed together, each separated by an IRES, creating polycistronic messages. 

By virtue of the IRES element, each open reading frame is accessible to ribosomes for efficient 
H" translation. Multiple genes can be efficiently expressed using a single promoter/enhancer to 
q transcribe a single message (see U.S. Patent 5,925,565 and 5,935,819, herein incorporated by 
; ~ reference). 
2& 

\1 c. Multiple Cloning Sites 

Vectors can include a multiple cloning site (MCS), which is a nucleic acid region that 
contains multiple restriction enzyme sites, any of which can be used in conjunction with standard 
recombinant technology to digest the vector. (See Carbonelli et al, 1999, Levenson et al, 1998, 

25 and Cocea, 1997, incorporated herein by reference.) "Restriction enzyme digestion" refers to 
catalytic cleavage of a nucleic acid molecule with an enzyme that functions only at specific 
locations in a nucleic acid molecule. Many of these restriction enzymes are commercially 
available. Use of such enzymes is widely understood by those of skill in the art. Frequently, a 
vector is linearized or fragmented using a restriction enzyme that cuts within the MCS to enable 

30 exogenous sequences to be ligated to the vector. "Ligation" refers to the process of forming 
phosphodiester bonds between two nucleic acid fragments, which may or may not be contiguous 
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with each other. Techniques involving restriction enzymes and ligation reactions are well known 
to those of skill in the art of recombinant technology. 

d. Splicing Sites 

5 Most transcribed eukaryotic RNA molecules will undergo RNA splicing to remove 

introns from the primary transcripts. Vectors containing genomic eukaryotic sequences may 
require donor and/or acceptor splicing sites to ensure proper processing of the transcript for 
protein expression. (See Chandler et al, 1997, herein incorporated by reference.) 

10 e. Polyadenylation Signals 

In expression, one will typically include a polyadenylation signal to effect proper 
polyadenylation of the transcript. The nature of the polyadenylation signal is not believed to be 

in crucial to the successful practice of the invention, and/or any such sequence may be employed. 
Preferred embodiments include the SV40 polyadenylation signal and/or the bovine growth 

15: hormone polyadenylation signal, convenient and/or known to function well in various target 

\h cells. Also contemplated as an element of the expression cassette is a transcriptional termination 
site. These elements can serve to enhance message levels and/or to minimize read through from 

Q the cassette into other sequences. 

20; f. Origins of Replication 

U In order to propagate a vector in a host cell, it may contain one or more origins of 

replication sites (often termed "ori"), which is a specific nucleic acid sequence at which 
replication is initiated. Alternatively an autonomously replicating sequence (ARS) can be 
employed if the host cell is yeast. 

25 

g. Selectable and Screenable Markers 

In certain embodiments of the invention, the cells contain nucleic acid construct of the 
present invention, a cell may be identified in vitro or in vivo by including a marker in the 
expression vector. Such markers would confer an identifiable change to the cell permitting easy 
30 identification of cells containing the expression vector. Generally, a selectable marker is one that 
confers a property that allows for selection. A positive selectable marker is one in which the 
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presence of the marker allows for its selection, while a negative selectable marker is one in 
which its presence prevents its selection. An example of a positive selectable marker is a drug 
resistance marker. 

Usually the inclusion of a drug selection marker aids in the cloning and identification of 
transformants, for example, genes that confer resistance to neomycin, puromycin, hygromycin, 
DHFR, GPT, zeocin and histidinol are useful selectable markers. In addition to markers 
conferring a phenotype that allows for the discrimination of transformants based on the 
implementation of conditions, other types of markers including screenable markers such as GFP, 
whose basis is colorimetric analysis, are also contemplated. Alternatively, screenable enzymes 
such as herpes simplex virus thymidine kinase (tk) or chloramphenicol acetyltransferase (CAT) 
may be utilized. One of skill in the art would also know how to employ immunologic markers, 
possibly in conjunction with FACS analysis. The marker used is not believed to be important, so 
long as it is capable of being expressed simultaneously with the nucleic acid encoding a gene 
product. Further examples of selectable and screenable markers are well known to one of skill in 
the art. 

2. Host Cells 

As used herein, the terms "cell," "cell line," and "cell culture" may be used 
interchangeably. All of these term also include their progeny, which is any and all subsequent 
generations. It is understood that all progeny may not be identical due to deliberate or 
inadvertent mutations. In the context of expressing a heterologous nucleic acid sequence, "host 
cell" refers to a prokaryotic or eukaryotic cell, and it includes any transformable organisms that 
is capable of replicating a vector and/or expressing a heterologous gene encoded by a vector. A 
host cell can, and has been, used as a recipient for vectors. A host cell may be "transfected" or 
"transformed," which refers to a process by which exogenous nucleic acid is transferred or 
introduced into the host cell. A transformed cell includes the primary subject cell and its 
progeny. 

Host cells may be derived from prokaryotes or eukaryotes, depending upon whether the 
desired result is replication of the vector or expression of part or all of the vector-encoded 
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nucleic acid sequences. Prokaryotes include gram negative or positive cells. Numerous cell 
lines and cultures are available for use as a host cell, and they can be obtained through the 
American Type Culture Collection (ATCC), which is an organization that serves as an archive 
for living cultures and genetic materials (www.atcc.org). An appropriate host can be determined 
by one of skill in the art based on the vector backbone and the desired result. A plasmid or 
cosmid, for example, can be introduced into a prokaryote host cell for replication of many 
vectors. Bacterial cells used as host cells for vector replication and/or expression include DH5ot, 
JM109, and KC8, as well as a number of commercially available bacterial hosts such as SURE® 
Competent Cells and Solopack™ Gold Cells (Stratagene®, La Jolla). Alternatively, bacterial 
cells such as E. coli LE392 could be used as host cells for phage viruses. 

Examples of eukaryotic host cells for replication and/or expression of a vector include C. 
elegans, HeLa, NIH3T3, Jurkat, 293, Cos, CHO, Saos, yeast, nematodes, insect cells, and PC12. 
Many host cells from various cell types and organisms are available and would be known to one 
of skill in the art. Similarly, a viral vector may be used in conjunction with either a eukaryotic or 
prokaryotic host cell, particularly one that is permissive for replication or expression of the 
vector. 

Some vectors may employ control sequences that allow it to be replicated and/or 
expressed in both prokaryotic and eukaryotic cells. One of skill in the art would further 
understand the conditions under which to incubate all of the above described host cells to 
maintain them and to permit replication of a vector. Also understood and known are techniques 
and conditions that would allow large-scale production of vectors, as well as production of the 
nucleic acids encoded by vectors and their cognate polypeptides, proteins, or peptides. 

3. Expression Systems 

Turning to the expression of the proteins of the present invention, once a suitable nucleic 
acid encoding sequence has been obtained, one may proceed to prepare an expression system. 
The engineering of DNA segment(s) for expression in a prokaryotic or eukaryotic system may be 
performed by techniques generally known to those of skill in recombinant expression. 
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It is believed that virtually any expression system may be employed in the expression of 
the proteins of the present invention. Prokaryote- and/or eukaryote-based systems can be 
employed for use with the present invention to produce nucleic acid sequences, or their cognate 
polypeptides, proteins and peptides. Many such systems are commercially and widely available. 

5 

Both cDNA and genomic sequences are suitable for eukaryotic expression, as the host 
cell will generally process the genomic transcripts to yield functional mRNA for translation into 
protein. Generally speaking, it may be more convenient to employ as the recombinant gene a 
cDNA version of the gene. It is believed that the use of a cDNA version will provide advantages 

10 in that the size of the gene will generally be much smaller and more readily employed to 
transfect the targeted cell than will a genomic gene, which will typically be up to an order of 
magnitude or more larger than the cDNA gene. However, it is contemplated that a genomic 

G version of a particular gene may be employed where desired. 

11 It is contemplated that proteins, polypeptides or peptides may be co-expressed with other 
selected proteins, polypeptides or peptides, wherein the proteins may be co-expressed in the 
same cell or gene(s) may be provided to a cell that already has another selected protein. 
Co-expression may be achieved by co-transfecting the cell with two distinct recombinant 

^ vectors, each bearing a copy of either of the respective DNA. Alternatively, a single 
2S recombinant vector may be constructed to include the coding regions for both of the proteins, 
: ; ffi f which could then be expressed in cells transfected with the single vector. In either event, the 
term "co-expression" herein refers to the expression of both at least one selected nucleic acid or 
gene encoding one or more proteins, polypeptides or peptides and at least a second selected 
nucleic acid or gene encoding at least one or more secondary selected proteins, polypeptides or 
25 peptides in the same recombinant cell. 

It is contemplated that proteins may be expressed in cell systems or grown in media that 
enhance protein production. One such system is described in U.S. Patent 5,834,249, 
incorporated herein by reference. In certain embodiments, the fusion protein may be co- 
30 expressed with one or more proteins that enhance refolding. Such proteins that enhance 
refolding include, for example, DsbA or DsbC proteins. A cell system co-expressing the DsbA 
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or DsbC proteins are described in U.S. Patent 5,639,635, incorporated herein by reference. In 
certain embodiments, it is contemplated that a temperature sensitive expression vector may be 
used to aid assaying protein folding at lower or higher temperatures than many E. coli cell 
strain's optimum growth at about 37°C. For example, a temperature sensitive expression vectors 
5 and host cells that express proteins at or below 20°C is described in U.S. Patents 5,654,169 and 
5,726,039, each incorporated herein by reference. 

As used herein, the terms "engineered" and "recombinant" cells or host cells are intended 
to refer to a cell into which an exogenous DNA segment or gene, such as a cDNA or gene 

10 encoding at least one protein, polypeptide or peptide has been introduced. Therefore, engineered 
cells are distinguishable from naturally occurring cells which do not contain a recombinantly 
introduced exogenous DNA segment or gene. Engineered cells are thus cells having a gene or 

^ genes introduced through the hand of man. Recombinant cells include those having an 

y introduced cDNA or genomic gene, and also include genes positioned adjacent to a promoter not 

1 fl naturally associated with the particular introduced gene. 

Certain examples of prokaryotic hosts are E. coli strain RR1, E. coli LE392, E. co//B, 
□ E. coli X 1776 (ATCC No. 31537) as well as E. coli W3110 (F-, lambda-, prototrophic, ATCC 
]Z s No. 273325); bacilli such as Bacillus subtilis; and other enterobacteriaceae such as Salmonella 
2ft typhimurium, Serratia marcescens, and various Pseudomonas species. 

In general, plasmid vectors containing replicon and control sequences which are derived 
from species compatible with the host cell are used in connection with these hosts. The vector 
ordinarily carries a replication site, as well as marking sequences which are capable of providing 

25 phenotypic selection in transformed cells. For example, E. coli is often transformed using 
derivatives of pBR322, a plasmid derived from an E. coli species. pBR322 contains genes for 
ampicillin and tetracycline resistance and thus provides easy means for identifying transformed 
cells. The pBR plasmid, or other microbial plasmid or phage must also contain, or be modified 
to contain, promoters which can be used by the microbial organism for expression of its own 

30 proteins. 



1657123.1 



-30- 



In addition, phage vectors containing replicon and control sequences that are compatible 
with the host microorganism can be used as transforming vectors in connection with these hosts. 
For example, the phage lambda GEM™- 11 may be utilized in making a recombinant phage 
vector which can be used to transform host cells, such as E. coli LE392. 

5 

Further useful vectors include pIN vectors (Inouye etal, 1985); and pGEX vectors, for 
use in generating glutathione S-transferase (GST) soluble fusion proteins for later purification 
and separation or cleavage. Other suitable fusion proteins are those with [3-galactosidase, 
ubiquitin, and the like. 

10 

Promoters that are most commonly used in recombinant DNA construction include the 
(3-lactamase (penicillinase), lactose and tryptophan (trp) promoter systems. While these are the 
:.n most commonly used, other microbial promoters have been discovered and utilized, and details 
C\ concerning their nucleotide sequences have been published, enabling those of skill in the art to 
l| ligate them functionally with plasmid vectors. 

a. Prokaryotic Expression 

Q The following details concerning recombinant protein production in bacterial cells, such 

i JL ; as E. coli, are provided by way of exemplary information on recombinant protein production in 
2|j general, the adaptation of which to a particular recombinant expression system will be known to 
H those of skill in the art. 

Bacterial cells, for example, E. coli, containing the expression vector are grown in any of 
a number of suitable media, for example, LB. The expression of the recombinant protein may be 
25 induced, e.g., by adding IPTG to the media or by switching incubation to a higher temperature. 
After culturing the bacteria for a further period, generally of between 2 and 24 hours, the cells 
are collected by centrifugation and washed to remove residual media. 

The bacterial cells are then lysed, for example, by disruption in a cell homogenizer and 
30 centrifuged to separate the dense inclusion bodies and cell membranes from the soluble cell 
components. This centrifugation can be performed under conditions whereby the dense 
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inclusion bodies are selectively enriched by incorporation of sugars, such as sucrose, into the 
buffer and centrifugation at a selective speed. 

If the recombinant protein is expressed in the inclusion bodies, as is the case in many 
instances, these can be washed in any of several solutions to remove some of the contaminating 
host proteins, then solubilized in solutions containing high concentrations of urea (e.g. 8M) or 
chaotropic agents such as guanidine hydrochloride in the presence of reducing agents, such as 
P-mercaptoethanol or DTT (dithiothreitol). 

Under some circumstances, it may be advantageous to incubate the protein for several 
hours under conditions suitable for the protein to undergo a refolding process into a 
conformation which more closely resembles that of the native protein. Such conditions generally 
include low protein concentrations, less than 500 mg/ml, low levels of reducing agent, 
concentrations of urea less than 2 M and often the presence of reagents such as a mixture of 
reduced and oxidized glutathione which facilitate the interchange of disulfide bonds within the 
protein molecule. 

The refolding process can be monitored, for example, by SDS-PAGE, or with antibodies 
specific for the native molecule (which can be obtained from animals vaccinated with the native 
molecule or smaller quantities of recombinant protein). Following refolding, the protein can 
then be purified further and separated from the refolding mixture by chromatography on any of 
several supports including ion exchange resins, gel permeation resins or on a variety of affinity 
columns. 
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b. Eukaryotic Expression 

In addition to micro-organisms, cultures of cells derived from multicellular organisms 
may also be used as hosts. In principle, any such cell culture is workable, whether from 
vertebrate or invertebrate culture. In addition to mammalian cells, these include insect cell 
systems infected with recombinant virus expression vectors (e.g., baculovirus); and plant cell 
systems infected with recombinant virus expression vectors (e.g., cauliflower mosaic virus, 
CaMV; tobacco mosaic virus, TMV) or transformed with recombinant plasmid expression 
vectors (e.g., Ti plasmid) containing one or more protein, polypeptide or peptide coding 
sequences. 

For expression in Saccharomyces, the plasmid YRp7, for example, is commonly used. 
This plasmid already contains the trp\ gene which provides a selection marker for a mutant strain 
of yeast lacking the ability to grow in tryptophan, for example ATCC No. 44076 or PEP4-1. The 
presence of the trp\ lesion as a characteristic of the yeast host cell genome then provides an 
effective environment for detecting transformation by growth in the absence of tryptophan. 

Suitable promoting sequences in yeast vectors include the promoters for 
3-phosphoglycerate kinase or other glycolytic enzymes, such as enolase, glyceraldehyde-3- 
phosphate dehydrogenase, hexokinase, pyruvate decarboxylase, phosphofructokinase, glucose-6- 
phosphate isomerase, 3-phosphoglycerate mutase, pyruvate kinase, triosephosphate isomerase, 
phosphoglucose isomerase, and glucokinase. In constructing suitable expression plasmids, the 
termination sequences associated with these genes are also ligated into the expression vector 3' 
of the sequence desired to be expressed to provide polyadenylation of the mRNA and 
termination. 

Other suitable promoters, which have the additional advantage of transcription controlled 
by growth conditions, include the promoter region for alcohol dehydrogenase 2, isocytochrome 
C, acid phosphatase, degradative enzymes associated with nitrogen metabolism, and the 
aforementioned glyceraldehyde-3 -phosphate dehydrogenase, and enzymes responsible for 
maltose and galactose utilization. 
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The insect ceM)aculovirus system can produce a high level of protein expression of a 
heterologous nucleic acid segment, such as described in U.S. Patent No. 5,871,986, 4,879,236, 
both herein incorporated by reference, and which can be bought, for example, under the name 
MaxBac® 2.0 from Invitrogen® and BacPack™ Baculovirus Expression System From 
5 Clontech®. 

In a useful insect system, Autograph californica nuclear polyhedrosis virus (AcNPV) is 
used as a vector to express foreign genes. The virus grows in Spodoptera frugiperda cells. The 
protein, polypeptide or peptide coding sequences are cloned into non-essential regions (for 

10 example the polyhedrin gene) of the virus and placed under control of an AcNPV promoter (for 
example the polyhedrin promoter). Successful insertion of the coding sequences results in the 
inactivation of the polyhedrin gene and production of non-occluded recombinant virus (i.e., virus 

^ lacking the proteinaceous coat coded for by the polyhedrin gene). These recombinant viruses are 

H then used to infect Spodoptera frugiperda cells in which the inserted gene is expressed (e.g., U.S. 

l|J Patent No. 4,215,051, Smith, incorporated herein by reference). 

ill 

H Other examples of expression systems include Stratagene®'s Complete Control™ 

ri Inducible Mammalian Expression System, which involves a synthetic ecdysone-inducible 
f " receptor, or its pET Expression System, an E. coli expression system. Another example of an 
2& inducible expression system is available from Invitrogen®, which carries the T-Rex™ 
'{2 (tetracycline-regulated expression) System, an inducible mammalian expression system that uses 
the full-length CMV promoter. Invitrogen® also provides a yeast expression system called the 
Pichia methanolica Expression System, which is designed for high-level production of 
recombinant proteins in the methylotrophic yeast Pichia methanolica. One of skill in the art 
25 would know how to express a vector, such as an expression construct, to produce a nucleic acid 
sequence or its cognate polypeptide, protein, or peptide. 

Examples of useful mammalian host cell lines are VERO and HeLa cells, Chinese 
hamster ovary (CHO) cell lines, W138, BHK, COS-7, 293, HepG2, 3T3, RIN and MDCK cell 
30 lines. In addition, a host cell strain may be chosen that modulates the expression of the inserted 
sequences, or modifies and processes the gene product in the specific fashion desired. Such 
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modifications (e.g., glycosylation) and processing (e.g., cleavage) of protein products may be 
important for the function of the protein. 

Different host cells have characteristic and specific mechanisms for the post-translational 
5 processing and modification of proteins. Appropriate cells lines or host systems can be chosen to 
ensure the correct modification and processing of the foreign protein expressed. 

Expression vectors for use in mammalian cells ordinarily include an origin of replication 
(as necessary), a promoter located in front of the gene to be expressed, along with any necessary 

10 ribosome binding sites, RNA splice sites, polyadenylation site, and transcriptional terminator 
sequences. The origin of replication may be provided either by construction of the vector to 
include an exogenous origin, such as may be derived from SV40 or other viral (e.g., Polyoma, 

1 1 Adeno, VSV, BPV) source, or may be provided by the host cell chromosomal replication 
'4 mechanism. If the vector is integrated into the host cell chromosome, the latter is often 
It* sufficient. 

The promoters may be derived from the genome of mammalian cells (e.g., 
:U metallothionein promoter) or from mammalian viruses (e.g., the adenovirus late promoter; the 
vaccinia virus 7.5K promoter). Further, it is also possible, and may be desirable, to utilize 
2tt promoter or control sequences normally associated with the gene sequence(s), provided such 
\ \ control sequences are compatible with the host cell systems. 

A number of viral based expression systems may be utilized, for example, commonly 
used promoters are derived from polyoma, Adenovirus 2, and most frequently Simian Virus 40 
25 (SV40). The early and late promoters of SV40 virus are particularly useful because both are 
obtained easily from the virus as a fragment which also contains the SV40 viral origin of 
replication. Smaller or larger SV40 fragments may also be used, provided there is included the 
approximately 250 bp sequence extending from the HindlU site toward the Bgll site located in 
the viral origin of replication. 

30 
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In cases where an adenovirus is used as an expression vector, the coding sequences may 
be ligated to an adenovirus transcription/ translation control complex, e.g., the late promoter and 
tripartite leader sequence. This chimeric gene may then be inserted in the adenovirus genome by 
in vitro or in vivo recombination. Insertion in a non-essential region of the viral genome (e.g., 
5 region El, E3, or E4) will result in a recombinant virus that is viable and capable of expressing 
proteins, polypeptides or peptides in infected hosts. 

Specific initiation signals may also be required for efficient translation of protein, 
polypeptide or peptide coding sequences. These signals include the ATG initiation codon and 

10 adjacent sequences. Exogenous translational control signals, including the ATG initiation codon, 
may additionally need to be provided. One of ordinary skill in the art would readily be capable 
of determining this and providing the necessary signals. It is well known that the initiation 

^ codon must be in-frame (or in-phase) with the reading frame of the desired coding sequence to 
^ ensure translation of the entire insert. These exogenous translational control signals and 

!j5 initiation codons can be of a variety of origins, both natural and synthetic. The efficiency of 

\ '% expression may be enhanced by the inclusion of appropriate transcription enhancer elements and 
transcription terminators. 

H In eukaryotic expression, one will also typically desire to incorporate into the 

W transcriptional unit an appropriate polyadenylation site (e.g., 5'-AATAAA-3') if one was not 
U contained within the original cloned segment. Typically, the poly A addition site is placed about 

30 to 2000 nucleotides "downstream" of the termination site of the protein at a position prior to 

transcription termination. 
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C. Gene Delivery 

The general approach to the aspects of the present invention is to provide a cell with nucleic 
acid encoding a fusion protein, polypeptide or peptide and/or a nucleic acid encoding a protein, 
polypeptide or peptide whose activity may be altered by complementation with the fusion protein, 
thereby permitting a detectable change in the activity of the proteins to take effect. While it is 
conceivable that the protein(s) may be delivered directly, a preferred embodiment involves 
providing a nucleic acid encoding the protein(s), polypeptide^) or peptide(s) to the cell. Following 
this provision, the polypeptide(s) are synthesized by the transcriptional and translational machinery 
of the cell, as well as any that may be provided by the expression construct. 

In certain embodiments of the invention, the nucleic acid encoding the gene may be 
stably integrated into the genome of the cell. In yet further embodiments, the nucleic acid may 
be stably maintained in the cell as a separate, episomal segment of DNA. Such nucleic acid 
segments or "episomes" encode sequences sufficient to permit maintenance and replication 
independent of or in synchronization with the host cell cycle. How the expression construct is 
delivered to a cell and where in the cell the nucleic acid remains is dependent on the type of 
expression construct employed. 



1 . DNA Delivery Using Viral Vectors 

The ability of certain viruses to infect cells or enter cells via receptor-mediated 
endocytosis, and to integrate into host cell genome and express viral genes stably and efficiently 
have made them attractive candidates for the transfer of foreign genes into mammalian cells. 
Preferred vectors of the present invention will generally be viral vectors. 

Although some viruses that can accept foreign genetic material are limited in the number 
of nucleotides they can accommodate and in the range of cells they infect, these viruses have 
been demonstrated to successfully effect gene expression. However, adenoviruses do not 
integrate their genetic material into the host genome and therefore do not require host replication 
for gene expression, making them ideally suited for rapid, efficient, heterologous gene 
expression. Techniques for preparing replication-defective infective viruses are well known in 
the art. 
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Of course, in using viral delivery systems, one will desire to purify the virion sufficiently 
to render it essentially free of undesirable contaminants, such as defective interfering viral 
particles or endotoxins and other pyrogens such that it will not cause any untoward reactions in 
the cell, animal or individual receiving the vector construct. A preferred means of purifying the 
vector involves the use of buoyant density gradients, such as cesium chloride gradient 
centrifugation. 



a. Adenoviral Vectors 

10 A particular method for delivery of the expression constructs involves the use of an 

adenovirus expression vector. Although adenovirus vectors are known to have a low capacity 
for integration into genomic DNA, this feature is counterbalanced by the high efficiency of gene 
Jq transfer afforded by these vectors. "Adenovirus expression vector" is meant to include those 
J constructs containing adenovirus sequences sufficient to (a) support packaging of the construct 
m and (b) to ultimately express a tissue or cell-specific construct that has been cloned therein. 

^ The expression vector comprises a genetically engineered form of adenovirus. 

□ Knowledge of the genetic organization or adenovirus, a 36 kb, linear, double-stranded DNA 
Ly virus » ^lov/s substitution of large pieces of adenoviral DNA with foreign sequences up to 7 kb 
id (Grunhaus and Horwitz, 1992). In contrast to retrovirus, the adenoviral infection of host cells 
W does not result in chromosomal integration because adenoviral DNA can replicate in an episomal 

manner without potential genotoxicity. Also, adenoviruses are structurally stable, and no 

genome rearrangement has been detected after extensive amplification. 

25 Adenovirus is particularly suitable for use as a gene transfer vector because of its mid- 

sized genome, ease of manipulation, high titer, wide target-cell range and high infectivity. Both 
ends of the viral genome contain 100-200 base pair inverted repeats (ITRs), which are cis 
elements necessary for viral DNA replication and packaging. The early (E) and late (L) regions 
of the genome contain different transcription units that are divided by the onset of viral DNA 

30 replication. The El region (E1A and E1B) encodes proteins responsible for the regulation of 
transcription of the viral genome and a few cellular genes. The expression of the E2 region (E2A 
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and E2B) results in the synthesis of the proteins for viral DNA replication. These proteins are 
involved in DNA replication, late gene expression and host cell shut-off (Renan, 1990). The 
products of the late genes, including the majority of the viral capsid proteins, are expressed only 
after significant processing of a single primary transcript issued by the major late promoter 
(MLP). The MLP (located at 16.8 m.u.) is particularly efficient during the late phase of 
infection, and all the mRNA's issued from this promoter possess a 5 -tripartite leader (TPL) 
sequence which makes them preferred mRNA's for translation. 

In a current system, recombinant adenovirus is generated from homologous 
recombination between shuttle vector and provirus vector. Due to the possible recombination 
between two proviral vectors, wild-type adenovirus may be generated from this process. 
Therefore, it is critical to isolate a single clone of virus from an individual plaque and examine 
its genomic structure. 

Generation and propagation of the current adenovirus vectors, which are replication 
deficient, depend on a unique helper cell line, designated 293, which was transformed from 
human embryonic kidney cells by Ad5 DNA fragments and constitutively expresses El proteins 
(El A and E1B; Graham etal, 1977). Since the E3 region is dispensable from the adenovirus 
genome (Jones and Shenk, 1978), the current adenovirus vectors, with the help of 293 cells, 
carry foreign DNA in either the El, the D3 or both regions (Graham and Prevec, 1991). 
Recently, adenoviral vectors comprising deletions in the E4 region have been described (U.S. 
Patent 5,670,488, incorporated herein by reference). 

In nature, adenovirus can package approximately 105% of the wild-type genome (Ghosh- 
Choudhury et al. 9 1987), providing capacity for about 2 extra kb of DNA. Combined with the 
approximately 5.5 kb of DNA that is replaceable in the El and E3 regions, the maximum 
capacity of the current adenovirus vector is under 7.5 kb, or about 15% of the total length of the 
vector. More than 80% of the adenovirus viral genome remains in the vector backbone. 

Helper cell lines may be derived from human cells such as human embryonic kidney 
cells, muscle cells, hematopoietic cells or other human embryonic mesenchymal or epithelial 
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cells. Alternatively, the helper cells may be derived from the cells of other mammalian species 
that are permissive for human adenovirus. Such cells include, e.g., Vero cells or other monkey 
embryonic mesenchymal or epithelial cells. As stated above, the preferred helper cell line is 293. 

Racher et al. (1995) disclosed improved methods for culturing 293 cells and propagating 
adenovirus. In one format, natural cell aggregates are grown by inoculating individual cells into 
1 liter siliconized spinner flasks (Techne, Cambridge, UK) containing 100-200 ml of medium. 
Following stirring at 40 rpm, the cell viability is estimated with trypan blue. In another format, 
Fibra-Cel microcarriers (Bibby Sterlin, Stone, UK) (5 g/1) is employed as follows. A cell 
inoculum, resuspended in 5 ml of medium, is added to the carrier (50 ml) in a 250 ml Erlenmeyer 
flask and left stationary, with occasional agitation, for 1 to 4 h. The medium is then replaced 
with 50 ml of fresh medium and shaking initiated. For virus production, cells are allowed to 
grow to about 80% confluence, after which time the medium is replaced (to 25% of the final 
volume) and adenovirus added at an MOI of 0.05. Cultures are left stationary overnight, 
following which the volume is increased to 100% and shaking commenced for another 72 h. 

Other than the requirement that the adenovirus vector be replication defective, or at least 
conditionally defective, the nature of the adenovirus vector is not believed to be crucial to the 
successful practice of the invention. The adenovirus may be of any of the 42 different known 
serotypes or subgroups A-F. Adenovirus type 5 of subgroup C is the preferred starting material 
in order to obtain the conditional replication-defective adenovirus vector for use in the present 
invention. This is because Adenovirus type 5 is a human adenovirus about which a great deal of 
biochemical and genetic information is known, and it has historically been used for most 
constructions employing adenovirus as a vector. 

As stated above, the typical vector according to the present invention is replication 
defective and will not have an adenovirus El region. Thus, it will be most convenient to 
introduce the transforming construct at the position from which the El-coding sequences have 
been removed. However, the position of insertion of the construct within the adenovirus 
sequences is not critical to the invention. The polynucleotide encoding the gene of interest may 
also be inserted in lieu of the deleted E3 region in E3 replacement vectors as described by 
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Karlsson et al (1986) or in the E4 region where a helper cell line or helper virus complements 
the E4 defect. 

Adenovirus growth and manipulation is known to those of skill in the art, and exhibits 
5 broad host range in vitro and in vivo, This group of viruses can be obtained in high titers, e.g., 
10 9 to 10 u plaque-forming units per ml, and they are highly infective. The life cycle of 
adenovirus does not require integration into the host cell genome. The foreign genes delivered 
by adenovirus vectors are episomal and, therefore, have low genotoxicity to host cells. 

10 Adenovirus vectors have been used in eukaryotic gene expression (Levrero etaL, 1991; 

Gomez-Foix etaL, 1992) and vaccine development (Grunhaus and Horwitz, 1992; Graham and 
Prevec, 1992). Recombinant adenovirus and adeno-associated virus (see below) can both infect 

^ and transduce non-dividing human primary cells. 

|i b. AAV Vectors 

\~n Adeno-associated virus (AAV) is an attractive vector system for use in the cell 

^ transduction of the present invention as it has a high frequency of integration and it can infect 

□ nondividing cells, thus making it useful for delivery of genes into mammalian cells, for example, 

;7" s in tissue culture (Muzyczka, 1992) or in vivo, AAV has a broad host range for infectivity 

2© (Tratschin etaL, 1984; Laughlin etaL, 1986; Lebkowski etaL, 1988; McLaughlin etaL, 1988). 

U Details concerning the generation and use of rAAV vectors are described in U.S. Patent No. 
5,139,941 and U.S. Patent No. 4,797,368, each incorporated herein by reference. 

Studies demonstrating the use of AAV in gene delivery include LaFace etaL (1988); 

25 Zhou etaL (1993); Flotte etaL (1993); and Walsh etaL (1994). Recombinant AAV vectors 
have been used successfully for in vitro and in vivo transduction of marker genes (Kaplitt etaL, 
1994; Lebkowski etaL, 1988; Samulski etaL, 1989; Yoder etaL, 1994; Zhou etaL, 1994; 
Hermonat and Muzyczka, 1984; Tratschin etaL, 1985; McLaughlin etaL, 1988) and genes 
involved in human diseases (Flotte etaL, 1992; Luo etaL, 1994; Ohi etaL, 1990; Walsh etaL, 

30 1994; Wei et aL, 1994). Recently, an AAV vector has been approved for phase I human trials 
for the treatment of cystic fibrosis. 
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AAV is a dependent parvovirus in that it requires coinfection with another virus (either 
adenovirus or a member of the herpes virus family) to undergo a productive infection in cultured 
cells (Muzyczka, 1992). In the absence of coinfection with helper virus, the wild type AAV 
5 genome integrates through its ends into human chromosome 19 where it resides in a latent state 
as a provirus (Kotin etaL, 1990; Samulski etaL, 1991). rAAV, however, is not restricted to 
chromosome 19 for integration unless the AAV Rep protein is also expressed (Shelling and 
Smith, 1994). When a cell carrying an AAV provirus is superinfected with a helper virus, the 
AAV genome is "rescued" from the chromosome or from a recombinant plasmid, and a normal 
10 productive infection is established (Samulski et al, 1989; McLaughlin et al, 1988; Kotin et al., 
1990; Muzyczka, 1992). 

'!% Typically, recombinant AAV (rAAV) virus is made by cotransfecting a plasmid 

: containing the gene of interest flanked by the two AAV terminal repeats (McLaughlin etaL, 
15 1988; Samulski etaL, 1989; each incorporated herein by reference) and an expression plasmid 
J/^ containing the wild type AAV coding sequences without the terminal repeats, for example 
f " pIM45 (McCarty etaL, 1991; incorporated herein by reference). The cells are also infected or 
: i> transfected with adenovirus or plasmids carrying the adenovirus genes required for AAV helper 

function. rAAV virus stocks made in such fashion are contaminated with adenovirus which must 
20 be physically separated from the rAAV particles (for example, by cesium chloride density 
\v~ centrifugation). Alternatively, adenovirus vectors containing the AAV coding regions or cell 

lines containing the AAV coding regions and some or all of the adenovirus helper genes could be 

used (Yang et al, 1994; Clark et al, 1995). Cell lines carrying the rAAV DNA as an integrated 

provirus can also be used (Flotte et aL, 1995). 

25 

c* Retroviral Vectors 

Retroviruses have promise as gene delivery vectors due to their ability to integrate their 
genes into the host genome, transferring a large amount of foreign genetic material, infecting a 
broad spectrum of species and cell types and of being packaged in special cell-lines (Miller, 
30 1992). 
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The retroviruses are a group of single-stranded RNA viruses characterized by an ability 
to convert their RNA to double-stranded DNA in infected cells by a process of reverse- 
transcription (Coffin, 1990). The resulting DNA then stably integrates into cellular 
chromosomes as a provirus and directs synthesis of viral proteins. The integration results in the 
5 retention of the viral gene sequences in the recipient cell and its descendants. The retroviral 
genome contains three genes, gag, pol, and env that code for capsid proteins, polymerase 
enzyme, and envelope components, respectively. A sequence found upstream from the gag gene 
contains a signal for packaging of the genome into virions. Two long terminal repeat (LTR) 
sequences are present at the 5' and 3' ends of the viral genome. These contain strong promoter 
10 and enhancer sequences and are also required for integration in the host cell genome (Coffin, 
1990). 

:J3 In order to construct a retroviral vector, a nucleic acid encoding a gene of interest is 

,] inserted into the viral genome in the place of certain viral sequences to produce a virus that is 
|S replication-defective. In order to produce virions, a packaging cell line containing the gag, pol, 
! s n and env genes but without the LTR and packaging components is constructed (Mann etaL, 
[ m 1983). When a recombinant plasmid containing a cDNA, together with the retroviral LTR and 
Q packaging sequences is introduced into this cell line (by calcium phosphate precipitation for 
: : y example), the packaging sequence allows the RNA transcript of the recombinant plasmid to be 
jZp packaged into viral particles, which are then secreted into the culture media (Nicolas and 
M Rubenstein, 1988; Temin, 1986; Mann etaL, 1983). The media containing the recombinant 
retroviruses is then collected, optionally concentrated, and used for gene transfer. Retroviral 
vectors are able to infect a broad variety of cell types. However, integration and stable 
expression require the division of host cells (Paskind et al y 1975). 

25 

Concern with the use of defective retrovirus vectors is the potential appearance of wild- 
type replication-competent virus in the packaging cells. This can result from recombination 
events in which the intact sequence from the recombinant virus inserts upstream from the gag, 
pol, env sequence integrated in the host cell genome. However, new packaging cell lines are 
30 now available that should greatly decrease the likelihood of recombination (Markowitz etal, 
1988; Hersdorffer et al 9 1990). 
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Gene delivery using second generation retroviral vectors has been reported. Kasahara 
etal (1994) prepared an engineered variant of the Moloney murine leukemia virus, that 
normally infects only mouse cells, and modified an envelope protein so that the virus specifically 
5 bound to, and infected, human cells bearing the erythropoietin (EPO) receptor. This was 
achieved by inserting a portion of the EPO sequence into an envelope protein to create a 
chimeric protein with a new binding specificity. 

d. Other Viral Vectors 

10 Other viral vectors may be employed as expression constructs in the present invention. 

Vectors derived from viruses such as vaccinia virus (Ridgeway, 1988; Baichwal and Sugden, 
1986; Coupar etal, 1988), sindbis virus, cytomegalovirus and herpes simplex virus may be 

;i employed. They offer several attractive features for various mammalian cells (Friedmann, 1989; 

/"{ Ridgeway, 1988; Baichwal and Sugden, 1986; Coupar et aL, 1988; Horwich et aL, 1990). 

m 

[fj With the recent recognition of defective hepatitis B viruses, new insight was gained into 

^ the stmcture- function relationship of different viral sequences. In vitro studies showed that the 
q virus could retain the ability for helper-dependent packaging and reverse transcription despite the 
Q deletion of up to 80% of its genome (Horwich et aL, 1990). This suggested that large portions of 
IjjKD the genome could be replaced with foreign genetic material. Chang et aL recently introduced the 
U chloramphenicol acetyltransferase (CAT) gene into duck hepatitis B virus genome in the place of 
the polymerase, surface, and pre-surface coding sequences. It was cotransfected with wild-type 
virus into an avian hepatoma cell line. Culture media containing high titers of the recombinant 
virus were used to infect primary duckling hepatocytes. Stable CAT gene expression was 
25 detected for at least 24 days after transfection (Chang et aL , 1 99 1 ). 

In certain further embodiments, the vector will be HSV. A factor that makes HSV an 
attractive vector is the size and organization of the genome. Because HSV is large, incorporation 
of multiple genes or expression cassettes is less problematic than in other smaller viral systems. 
30 In addition, the availability of different viral control sequences with varying performance 
(temporal, strength, etc.) makes it possible to control expression to a greater extent than in other 
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systems. It also is an advantage that the virus has relatively few spliced messages, further easing 
genetic manipulations. HSV also is relatively easy to manipulate and can be grown to high titers. 
Thus, delivery is less of a problem, both in terms of volumes needed to attain sufficient MOI and 
in a lessened need for repeat dosings. 

5 

e. Modified Viruses 

In still further embodiments of the present invention, the nucleic acids to be delivered are 
housed within an infective virus that has been engineered to express a specific binding ligand. 
The virus particle will thus bind specifically to the cognate receptors of the target cell and deliver 

10 the contents to the cell. A novel approach designed to allow specific targeting of retrovirus 
vectors was recently developed based on the chemical modification of a retrovirus by the 
chemical addition of lactose residues to the viral envelope. This modification can permit the 

1 specific infection of hepatocytes via sialogly coprotein receptors. 

IS Another approach to targeting of recombinant retroviruses was designed in which 

W biotinylated antibodies against a retroviral envelope protein and against a specific cell receptor 

l s 3 

were used. The antibodies were coupled via the biotin components by using streptavidin (Roux 
^ etal, 1989). Using antibodies against major histocompatibility complex class I and class II 
I s * antigens, they demonstrated the infection of a variety of human cells that bore those surface 
%& antigens with an ecotropic virus in vitro (Roux et al 9 1989). 



2. Other Methods of DNA Delivery 

In various embodiments of the invention, DNA is delivered to a cell as an expression 
construct. In order to effect expression of a gene construct, the expression construct must be 

25 delivered into a cell. As described herein, the preferred mechanism for delivery is via viral 
infection, where the expression construct is encapsidated in an infectious viral particle. 
However, several non-viral methods for the transfer of expression constructs into cells also are 
contemplated by the present invention. In one embodiment of the present invention, the 
expression construct may consist only of naked recombinant DNA or plasmids. Transfer of the 

30 construct may be performed by any of the methods mentioned which physically or chemically 
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permeabilize the cell membrane. Some of these techniques may be successfully adapted for ii 
or ex vivo use, as discussed below. 



a. Liposome-Mediated Transfection 

In a further embodiment of the invention, the expression construct may be entrapped in a 
liposome. Liposomes are vesicular structures characterized by a phospholipid bilayer membrane 
and an inner aqueous medium. Multilamellar liposomes have multiple lipid layers separated by 
aqueous medium. They form spontaneously when phospholipids are suspended in an excess of 
aqueous solution. The lipid components undergo self-rearrangement before the formation of 
closed structures and entrap water and dissolved solutes between the lipid bilayers (Ghosh and 
Bachhawat, 1991). Also contemplated is an expression construct complexed with Lipofectamine 
(Gibco BRL). 

Liposome-mediated nucleic acid delivery and expression of foreign DNA in vitro has 
been very successful (Nicolau and Sene, 1982; Fraley etal, 1979; Nicolau etal, 1987). Wong 
etal. (1980) demonstrated the feasibility of liposome-mediated delivery and expression of 
foreign DNA in cultured chick embryo, HeLa and hepatoma cells. 

In certain embodiments of the invention, the liposome may be complexed with a 
hemagglutinating virus (HVJ). This has been shown to facilitate fusion with the cell membrane 
and promote cell entry of liposome-encapsulated DNA (Kaneda etal, 1989). In other 
embodiments, the liposome may be complexed or employed in conjunction with nuclear non- 
histone chromosomal proteins (HMG-1) (Kato etal, 1991). In yet further embodiments, the 
liposome may be complexed or employed in conjunction with both HVJ and HMG-1. In other 
embodiments, the delivery vehicle may comprise a ligand and a liposome. Where a bacterial 
promoter is employed in the DNA construct, it also will be desirable to include within the liposome 
an appropriate bacterial polymerase. 
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b. Electroporation 

In certain embodiments of the present invention, the expression construct is introduced 
into the cell via electroporation. Electroporation involves the exposure of a suspension of cells 
and DNA to a high-voltage electric discharge. 

5 

Transfection of eukaryotic cells using electroporation has been quite successful, Mouse 
pre-B lymphocytes have been transfected with human kappa-immunoglobulin genes (Potter 
etal, 1984), and rat hepatocytes have been transfected with the chloramphenicol 
acetyltransferase gene (Tur-Kaspa et al. 9 1986) in this manner. 

10 

c. Calcium Phosphate or DEAE-Dextran 

In other embodiments of the present invention, the expression construct is introduced to 
JS the cells using calcium phosphate precipitation. Human KB cells have been transfected with 
^ adenovirus 5 DNA (Graham and Van Der Eb, 1973) using this technique. Also in this manner, 
W mouse L(A9), mouse C127, CHO, CV-1, BHK, NIH3T3 and HeLa cells were transfected with a 
i'S neomycin marker gene (Chen and Okayama, 1987), and rat hepatocytes were transfected with a 
variety of marker genes (Rippe et al., 1990). 



n In another embodiment, the expression construct is delivered into the cell using DEAE- 

W dextran followed by polyethylene glycol. In this manner, reporter plasmids were introduced into 
lj, mouse myeloma and erythroleukemia cells (Gopal, 1985). 

d. Particle Bombardment 

Another embodiment of the invention for transferring a naked DNA expression construct 
25 into cells may involve particle bombardment. This method depends on the ability to accelerate 
DNA-coated microprojectiles to a high velocity allowing them to pierce cell membranes and 
enter cells without killing them (Klein etal, 1987). Several devices for accelerating small 
particles have been developed. One such device relies on a high voltage discharge to generate an 
electrical current, which in turn provides the motive force (Yang etal, 1990). The 
30 microprojectiles used have consisted of biologically inert substances such as tungsten or gold 
beads. 
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e. Direct Microinjection or Sonication Loading 

Further embodiments of the present invention include the introduction of the expression 
construct by direct microinjection or sonication loading. Direct microinjection has been used to 
5 introduce nucleic acid constructs into Xenopus oocytes (Harland and Weintraub, 1985), and 

LTK" fibroblasts have been transfected with the thymidine kinase gene by sonication loading 
(Fechheimer etal^ 1987). 

f. Adenoviral Assisted Transfection 

10 In certain embodiments of the present invention, the expression construct is introduced 

into the cell using adenovirus assisted transfection. Increased transfection efficiencies have been 
reported in cell systems using adenovirus coupled systems (Kelleher and Vos, 1994; Cotten 

5 etal 9 1992; Curiel, 1994), 

g. Receptor Mediated Transfection 

Iff Still further expression constructs that may be employed to deliver nucleic acid construct 

7" to target cells are receptor-mediated delivery vehicles. These take advantage of the selective 
l P uptake of macromolecules by receptor-mediated endocytosis that will be occurring in the target 
iy cells. In view of the cell type-specific distribution of various receptors, this delivery method 
j3) adds another degree of specificity to the present invention. Specific delivery in the context of 

another mammalian cell type is described by Wu and Wu (1993; incorporated herein by 

reference). 

Certain receptor-mediated gene targeting vehicles comprise a cell receptor-specific ligand 
25 and a DNA-binding agent. Others comprise a cell receptor-specific ligand to which the DNA 
construct to be delivered has been operatively attached. Several ligands have been used for 
receptor-mediated gene transfer (Wu and Wu, 1987; Wagner etaL, 1990; Perales etal, 1994; 
Myers, EPO 0273085), which establishes the operability of the technique. In certain aspects of 
the present invention, the ligand will be chosen to correspond to a receptor specifically expressed 
30 on the EOE target cell population. 
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In other embodiments, the DNA delivery vehicle component of a cell-specific gene 
targeting vehicle may comprise a specific binding ligand in combination with a liposome. The 
nucleic acids to be delivered are housed within the liposome and the specific binding ligand is 
functionally incorporated into the liposome membrane. The liposome will thus specifically bind 
to the receptors of the target cell and deliver the contents to the cell. Such systems have been 
shown to be functional using systems in which, for example, epidermal growth factor (EGF) is 
used in the receptor-mediated delivery of a nucleic acid to cells that exhibit upregulation of the 
EGF receptor. 

In still further embodiments, the DNA delivery vehicle component of the targeted 
delivery vehicles may be a liposome itself, which will preferably comprise one or more lipids or 
glycoproteins that direct cell-specific binding. For example, Nicolau etal. (1987) employed 
lactosyl-ceramide, a galactose-terminal asialganglioside, incorporated into liposomes and 
observed an increase in the uptake of the insulin gene by hepatocytes. It is contemplated that the 
tissue-specific transforming constructs of the present invention can be specifically delivered into 
the target cells in a similar manner. 

h. Homologous Recombination 

Homologous recombination (Koller and Smithies, 1992) allows the precise modification 
of existing genes, overcomes the problems of positional effects and insertional inactivation, and 
allows the inactivation of specific genes, as well as the replacement of one gene for another. 
Methods for homologous recombination are described in U. S. Patent 5,614,396, incorporated 
herein in its entirety by reference. 

Thus a preferred method for the delivery of transgenic constructs involves the use of 
homologous recombination. Homologous recombination relies, like antisense, on the tendency 
of nucleic acids to base pair with complementary sequences. In this instance, the base pairing 
serves to facilitate the interaction of two separate nucleic acid molecules so that strand breakage 
and repair can take place. In other words, the "homologous" aspect of the method relies on 
sequence homology to bring two complementary sequences into close proximity, while the 
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"recombination" aspect provides for one complementary sequence to replace the other by virtue 
of the breaking of certain bonds and the formation of others. 

Put into practice, homologous recombination is used as follows. First, a site for 
integration is selected within the host cell. Sequences homologous to the integration site are then 
included in a genetic construct, flanking the selected gene to be integrated into the genome. 
Flanking, in this context, simply means that target homologous sequences are located both 
upstream (5') and downstream (3') of the selected gene. These sequences should correspond to 
some sequences upstream and downstream of the target gene. The construct is then introduced 
into the cell, thus permitting recombination between the cellular sequences and the construct. 

As a practical matter, the genetic construct will normally act as far more than a vehicle to 
insert the gene into the genome. For example, it is important to be able to select for 
recombinants and, therefore, it is common to include within the construct a selectable marker 
gene. This gene permits selection of cells that have integrated the construct into their genomic 
DNA by conferring resistance to various biostatic and biocidal drugs. In addition, this technique 
may be used to "knock-out" (delete) or interrupt a particular gene. Thus, another approach for 
altering or mutating a gene involves the use of homologous recombination, or "knock-out 
technology". This is accomplished by including a mutated or vastly deleted form of the 
heterologous gene between the flanking regions within the construct. The arrangement of a 
construct to effect homologous recombination might be as follows: 

...vector»5'-flanking sequence»selected gene* selectable marker gene^flanking sequence- 
3'«vector... 

Thus, using this kind of construct, it is possible, in a single recombinatorial event, to (i) 
"knock out" an endogenous gene, (ii) provide a selectable marker for identifying such an event 
and (iii) introduce a transgene for expression. 

Another refinement of the homologous recombination approach involves the use of a 
"negative" selectable marker. One example is the use of the cytosine deaminase gene in a 
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negative selection method as described in U.S. Patent No, 5,624,830. The negative selection 
marker, unlike the selectable marker, causes death of cells which express the marker. Thus, it is 
used to identify undesirable recombination events. When seeking to select homologous 
recombinants using a selectable marker, it is difficult in the initial screening step to identify 
5 proper homologous recombinants from recombinants generated from random, non-sequence 
specific events. These recombinants also may contain the selectable marker gene and may 
express the heterologous protein of interest, but will, in all likelihood, not have the desired 
phenotype. By attaching a negative selectable marker to the construct, but outside of the 
flanking regions, one can select against many random recombination events that will incorporate 
10 the negative selectable marker. Homologous recombination should not introduce the negative 
selectable marker, as it is outside of the flanking sequences. 

Q 3. Marker genes 

\ j In certain aspects of the present invention, specific cells are tagged with specific genetic 

1,5 markers to provide information about the fate of the tagged cells. Therefore, the present 
O invention also provides recombinant candidate screening and selection methods which are based 
ft upon whole cell assays and which, preferably, employ a reporter gene that confers on its 

recombinant hosts a readily detectable phenotype that emerges only under conditions where a 
\** general DNA promoter positioned upstream of the reporter gene is functional. Generally, 
ZQ reporter genes encode a polypeptide (marker protein) not otherwise produced by the host cell 
; - J which is detectable by analysis of the cell culture, e.g., by fluorometric, radioisotopic or 

spectrophotometric analysis of the cell culture. 

In other aspects of the present invention, a genetic marker is provided which is detectable 
25 by standard genetic analysis techniques, such as DNA amplification by PCR™ or hybridization 
using fluorometric, radioisotopic or spectrophotometric probes. 

a. Screening 

Exemplary enzymes include esterases, phosphatases, proteases (tissue plasminogen 
30 activator or urokinase) and other enzymes capable of being detected by their activity, as will be 
known to those skilled in the art. Contemplated for use in the present invention is green 

1657123.1 

-51- 



fluorescent protein (GFP) as a marker for transgene expression (Chalfie et aL 9 1994). The use of 
GFP does not need exogenously added substrates, only irradiation by near UV or blue light, and 
thus has significant potential for use in monitoring gene expression in living cells. 

5 Other particular examples are the enzyme chloramphenicol acetyltransferase (CAT) 

which may be employed with a radiolabeled substrate, firefly and bacterial luciferase, and the 
bacterial enzymes |3-galactosidase and P-glucuronidase. Other marker genes within this class are 
well known to those of skill in the art, and are suitable for use in the present invention. 

10 b. Selection 

Another class of reporter genes which confer detectable characteristics on a host cell are 
those which encode polypeptides, generally enzymes, which render their transformants resistant 
; ; against toxins. Examples of this class of reporter genes are the neo gene (Colberre-Garapin 
M etal, 1981) which protects host cells against toxic levels of the antibiotic G418, the gene 
13 conferring streptomycin resistance (U. S. Patent 4,430,434), the gene conferring hygromycin B 
resistance (Santerre etaL, 1984; U. S. Patents 4,727,028, 4,960,704 and 4,559,302), a gene 
encoding dihydrofolate reductase, which confers resistance to methotrexate (Alt et ah, 1978), the 
u ^ enzyme HPRT, along with many others well known in the art (Kaufman, 1 990). 

%% D. Culture System 

;„1 For long-term, high-yield production of a recombinant protein, polypeptide or peptide, 

stable expression is preferred. For example, cell lines that stably express constructs encoding a 
protein, polypeptide or peptide may be engineered. Rather than using expression vectors that 
contain viral origins of replication, host cells can be transformed with vectors controlled by 

25 appropriate expression control elements {e.g., promoter, enhancer, sequences, transcription 
terminators, polyadenylation sites, etc.), and a selectable marker. Following the introduction of 
foreign DNA, engineered cells may be allowed to grow for 1-2 days in an enriched media, and 
then are switched to a selective media. The selectable marker in the recombinant plasmid 
confers resistance to the selection and allows cells to stably integrate the plasmid into their 

30 chromosomes and grow to form foci which in turn can be cloned and expanded into cell lines. 
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A number of selection systems may be used, including, but not limited to, the herpes 
simplex virus thymidine kinase (tk), hypoxanthine-guanine phosphoribosyltransferase (hgprt) 
and adenine phosphoribosyltransferase (aprt) genes, in tk", hgprt" or aprt" cells, respectively. 
Also, antimetabolite resistance can be used as the basis of selection for dihydrofolate reductase 
5 (dhfir), that confers resistance to methotrexate; gpt, that confers resistance to mycophenolic acid; 
neomycin (neo), that confers resistance to the aminoglycoside G-418; and hygromycin (hygro), 
that confers resistance to hygromycin. 

Animal cells can be propagated in vitro in two modes: as non-anchorage dependent cells 
10 growing in suspension throughout the bulk of the culture or as anchorage-dependent cells 
requiring attachment to a solid substrate for their propagation a monolayer type of cell 
growth). 

Non-anchorage dependent or suspension cultures from continuous established cell lines 
jiff are the most widely used means of large scale production of cells and cell products. However, 
'■t suspension cultured cells have limitations, such as tumorigenic potential and lower protein 
production than adherent cells. 

3 ] Large scale suspension culture of mammalian cells in stirred tanks is a common method 

2M for production of recombinant proteins. Two suspension culture reactor designs are in wide use - 
\ I the stirred reactor and the airlift reactor. The stirred design has successfully been used on an 
8000 liter capacity for the production of interferon. Cells are grown in a stainless steel tank with 
a height-to-diameter ratio of 1:1 to 3:1. The culture is usually mixed with one or more agitators, 
based on bladed disks or marine propeller patterns. Agitator systems offering less shear forces 
25 than blades have been described. Agitation may be driven either directly or indirectly by 
magnetically coupled drives. Indirect drives reduce the risk of microbial contamination through 
seals on stirrer shafts. 

The airlift reactor, also initially described for microbial fermentation and later adapted for 
30 mammalian culture, relies on a gas stream to both mix and oxygenate the culture. The gas 
stream enters a riser section of the reactor and drives circulation. Gas disengages at the culture 
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surface, causing denser liquid free of gas bubbles to travel downward in the downcomer section 
of the reactor. The main advantage of this design is the simplicity and lack of need for 
mechanical mixing. Typically, the height-to-diameter ratio is 10:1. The airlift reactor scales up 
relatively easily, has good mass transfer of gases and generates relatively low shear forces. 

5 

It is contemplated that the proteins, polypeptides or peptides of the invention may be 
"overexpressed", i.e. 9 expressed in increased levels relative to its natural expression in cells. 
Such overexpression may be assessed by a variety of methods, including radio-labeling and/or 
protein purification. However, simple and direct methods are preferred, for example, those 

10 involving SDS/PAGE and protein staining or western blotting, followed by quantitative analyses, 
such as densitometric scanning of the resultant gel or blot. A specific increase in the level of the 
recombinant protein or peptide in comparison to the level in natural cells is indicative of 
overexpression, as is a relative abundance of the specific protein in relation to the other proteins 

: % { produced by the host cell and, e.g., visible on a gel. 

m 

Q E. Complementation 

U The terms "structural complementation", "complementation" or "alpha complementation" 

% t as used herein certain embodiments refers to the ability of at least one polypeptide comprising a 
H protein fragment or domain to alter the activity of at least a second polypeptide comprising a 
|J protein fragment or domain. In certain embodiments, the at least one polypeptide and the at least 
; f second polypeptide are derived from the same precursor protein sequence. A non-limiting 
example of this is the complementation of p-lactosidase's activity that occurs when the a- 
fragment and the co fragment of P-lactosidase interact to produce an active P-lactosidase 
enzymatic complex. 

25 

Other complementing protein fragments are known in the art. Non-limiting examples 
include the P. falciparum thymidylate synthase and dihydrofolate reductase domains (Shallom et 
ah, 1999), and the alpha and beta subunits of the mitochondrial processing peptidase of different 
species (Adamec et al, 1999), whose activity was detected by the used of temperature sensitive 
30 mutant yeast strains. 
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Thus, it is contemplated that various peptide or polypeptide sequences may be used to 
produce fusion proteins with a target protein, so that the folding of the target protein into a 
soluble form can be detected via the change in activity of the complemented peptide or 
polypeptide. It is also contemplated that additional complementing fragments of commonly used 
or well known selectable or screenable markers may be made for use in the present invention. 
Non-limiting examples of such markers include a target binding protein, such as ubiquitin; an 
enzyme, such as p-galactosidase, cytochrome c, chymotrypsin inhibitor, Rnase, 
phosphoglycerate kinase, invertase, staphylococcal nuclease, thioredoxin C, lactose permease, 
amino acyl tRNA synthase, or dihydrofolate reductase; a protein inhibitor, a fluorophore or a 
chromophore, such as green fluorescent protein, blue fluorescent protein, yellow fluorescent 
protein, luciferase or aquorin. 

It is contemplated that one or more fragments of such markers may be produced through 
recombinant technology that is well known to those of skill in the art, to produce an 
complementation system for assaying protein folding as described herein. In a non-limiting 
example, a nucleic acid encoding a N-terminal sequence of about 250 amino acids or less of a 
marker protein may be operatively associated with a nucleic acid of a protein of interest to be 
folded into soluble form. Such nucleic acids may be used to construct an expression vector as 
described herein, and used to complement a cell that expresses the C-terminal terminal sequence 
of the marker protein. In an alternative non-limiting example, a nucleic acid encoding a C- 
terminal sequence of about 250 amino acids or less of a marker protein may be operatively 
associated with a nucleic acid of a protein of interest to be folded into soluble form. Such 
nucleic acids may be used to construct an expression vector as described herein, and used to 
complement a cell that expresses the N-terminal terminal sequence of the marker protein. Of 
course, one of skill in the art may design nucleic acids encoding marker gene fragments of 
various lenghts. In certain embodiments, the marker gene fragment may encode a polypeptide or 
peptide of less than about 200, about 150, about 100, about 99, about 98, about 97, about 96, 
about 95, about 94, about 93, about 92, about 91, about 90, about 89, about 88, about 87, about 
86, about 85, about 84, about 83, about 82, about 81, about 80, about 79, about 78, about 77, 
about 76, about 75, about 74, about 73, about 72, about 71, about 70, about 69, about 68, about 
67, about 66, about 65, about 64, about 63, about 62, about 61, about 60, about 59, about 58, 
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about 57, about 56, about 55, about 54, about 53, about 52, about 51, about 50, about 49, about 
48, about 47, about 46, about 45, about 44, about 43, about 42, about 41, about 40, about 39, 
about 38, about 37, about 36, about 35, about 34, about 33, about 32, about 31, about 30, about 
29, about 28, about 27, about 26, about 25, about 24, about 23, about 22, about 21, about 20, 
about 19, about 18, about 17, about 16, about 15, about 14, about 13, about 12, about 11, about 
10, about 9, about 8, about 7, about 6, about 5, to about 4 amino acids, which is operatively 
associated with the nucleic acid encoding the protein that is soluble when folded correctly. 

F. Screening Assays 

The present invention is directed to the use of an a-complementation system to screen for 
various aspects of protein fold and/or solubility. As discussed above, an important aspect of the 
invention is the use of a fusion protein that contains sequences from the protein of interest as 
well as a portion of a marker protein. The marker protein, in the context of the fusion, is 
incapable of exhibiting its detectable phenotype. However, when expressed in an environment 
that also includes the complementing portion of the marker protein, "complementation" takes 
place and a detectable event occurs, assuming that the protein is properly folded and remains 
soluble. This assay provides many advantages, including fidelity, sensitivity, ease of handling, 
and ready adaptability. 

1. Methods 

There are three primary applications for the invention: screening of proteins for 
suitability in recombinant polypeptide production, screening for mutants or domain boundaries 
with altered folding and/or solubility profiles (e.g., diagnosis of disease), and screening for drugs 
that modulate protein folding and/or solubility. In the first embodiment, the method includes the 
steps of: 

a) providing an expression construct comprising (i) a gene encoding a fusion protein, 
said fusion protein comprising a protein of interest fused to a first segment of a 
marker protein, wherein said first segment does not affect the folding or solubility 
of the protein of interest, or affects it only is a systematic (i.e., predictable and 
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repeatable) manner and (ii) a promoter active in said host cell and operably linked 
to said gene; 

b) expressing said fusion protein in a host cell that also expresses a second segment 
of said marker protein, wherein said second segment is capable of structural 
complementation with said first segment; and 

c) determining structural complementation. 

By comparing the degree of structural complementation in the method with that seen with 
appropriate negative controls, changes in folding and/or solubility of said protein can be 
determined. By looking at particular cell types from patients suspected of having particular 
disease states, this general method of screening can be transformed into a specific diagnostic 
method. 

In another embodiment, a method of screening for folding and/or solubility mutants is 
provided, and includes the steps of: 

a) providing a gene encoding fusion protein comprising (i) a protein of interest and 
(ii) a first segment of a marker protein, wherein said first segment does not affect 
the folding or solubility of the protein of interest, or affects it only is a systematic 
(i.e., predictable and repeatable) manner, wherein said fusion protein is not 
properly folded and/or soluble when expressed in said host cell; 

b) mutagenizing that portion of the gene encoding said protein of interest; 

c) expressing said fusion protein in a host cell that expresses a second segment of 
said marker protein, wherein said second segment is capable of structural 
complementation with said first segment; and 

d) determining structural complementation. 

Again, a relative change in structural complementation, as compared to the structural 
complementation observed with the unmutagenized fusion protein, indicates a change in proper 
folding and/or solubility of said protein. An alternative embodiment involves the mutation of a 
gene of interest prior to its fusion with the marker protein segment. 

1657123.1 



-57- 



Finally, a third assay involves screening for candidate modulator substances that 
modulate protein folding and/or solubility, including the steps of: 

a) providing an expression construct comprising (i) a gene encoding fusion protein, 
said fusion protein comprising a protein of interest fused to a first segment of a 
marker protein, wherein said first segment does not affect the folding or solubility 
of the protein of interest, or affects it only is a systematic (i.e., predictable and 
repeatable) manner, and (ii) a promoter active in said host cell and operably 
linked to said gene; 

b) expressing said fusion protein in a host cell that expresses a second segment of 
said marker protein, wherein said second segment is capable of structural 
complementation with said first segment; 

c) contacting the host cell with said candidate modulator substance; and 

d) determining structural complementation. 

Again, a relative change in structural complementation, as compared to the structural 
complementation observed in the absence of said candidate modulator substance, indicates that 
said candidate modulator substance is a modulator of protein folding and/or solubility 

2, Modulators 

As used herein the term "candidate substance" refers to any molecule that may potentially 
inhibit or enhance protein folding and/or solubility. The candidate substance may be a protein or 
fragment thereof, a small molecule, or even a nucleic acid molecule. Using lead compounds to 
help develop improved compounds is know as "rational drug design" and includes not only 
comparisons with know inhibitors and activators, but predictions relating to the structure of 
target molecules. 

The goal of rational drug design is to produce structural analogs of biologically active 
polypeptides or target compounds. By creating such analogs, it is possible to fashion drugs, 
which are more active or stable than the natural molecules, which have different susceptibility to 
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alteration or which may affect the function of various other molecules. In one approach, one 
would generate a three-dimensional structure for a target molecule, or a fragment thereof. This 
could be accomplished by x-ray crystallography, computer modeling or by a combination of both 
approaches. 

5 

It also is possible to use antibodies to ascertain the structure of a target compound 
activator or inhibitor. In principle, this approach yields a pharmacore upon which subsequent 
drug design can be based. It is possible to bypass protein crystallography altogether by 
generating anti-idiotypic antibodies to a functional, pharmacologically active antibody. As a 

10 mirror image of a mirror image, the binding site of anti-idiotype would be expected to be an 
analog of the original antigen. The anti-idiotype could then be used to identify and isolate 
peptides from banks of chemically- or biologically-produced peptides. Selected peptides would 

"5 then serve as the pharmacore. Anti-idiotypes may be generated using the methods described 

^ herein for producing antibodies, using an antibody as the antigen. 

m 

On the other hand, one may simply acquire, from various commercial sources, small 
^ molecule libraries that are believed to meet the basic criteria for useful drugs in an effort to 
□ "brute force" the identification of useful compounds. Screening of such libraries, including 

combinatorially generated libraries {e.g., peptide libraries), is a rapid and efficient way to screen 
WB large number of related (and unrelated) compounds for activity. Combinatorial approaches also 
\1 lend themselves to rapid evolution of potential drugs by the creation of second, third and fourth 

generation compounds modeled of active, but otherwise undesirable compounds. 

Candidate compounds may include fragments or parts of naturally-occurring compounds, 
25 or may be found as active combinations of known compounds, which are otherwise inactive. It 
is proposed that compounds isolated from natural sources, such as animals, bacteria, fungi, plant 
sources, including leaves and bark, and marine samples may be assayed as candidates for the 
presence of potentially useful pharmaceutical agents. It will be understood that the 
pharmaceutical agents to be screened could also be derived or synthesized from chemical 
30 compositions or man-made compounds. Thus, it is understood that the candidate substance 
identified by the present invention may be peptide, polypeptide, polynucleotide, small molecule 
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inhibitors or any other compounds that may be designed through rational drug design starting 
from known inhibitors or stimulators. 

Other suitable modulators include antisense molecules, ribozymes, and antibodies 
5 (including single chain antibodies), each of which would be specific for the target molecule. 
Such compounds are described in greater detail elsewhere in this document. For example, an 
antisense molecule that bound to a translational or transcriptional start site, or splice junctions, 
would be ideal candidate inhibitors. 

10 In addition to the modulating compounds initially identified, the inventors also 

contemplate that other sterically similar compounds may be formulated to mimic the key 
portions of the structure of the modulators. Such compounds, which may include 

K j peptidomimetics of peptide modulators, may be used in the same manner as the initial 
modulators. 

m 

\h 3. Assay Formats 

A quick, inexpensive and easy assay to run is an in vitro assay. Various cell lines can be 
G utilized for such screening assays, including cells specifically engineered for this purpose, as 
j j discussed in detail above. Depending on the assay, culture may be required. The cell is 
20 examined using a-complementation as a readout. Alternatively, molecular analysis may be 
M performed, for example, looking at protein expression, mRNA expression (including differential 
display of whole cell or polyA RNA) and others. 

In vivo assays involve the use of various animal models, including transgenic animals that 
25 have been engineered to express both the fusion protein (target protein + first marker segment) 
and the complementing molecule (second marker segment). Due to their size, ease of handling, 
and information on their physiology and genetic make-up, mice are a preferred embodiment, 
especially for transgenics. However, other animals are suitable as well, including insects, 
nematodes, rats, rabbits, hamsters, guinea pigs, gerbils, woodchucks, cats, dogs, sheep, goats, 
30 pigs, cows, horses and monkeys (including chimps, gibbons and baboons). Assays for 
modulators may be conducted using an animal model derived from any of these species. 
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In such assays, one or more candidate substances are administered to an animal, and the 
ability of the candidate substance(s) to alter protein folding and/or solubility, as compared to a 
similar animal not treated with the candidate substance(s), identifies a modulator. 

Treatment of these animals with candidate substances will involve the administration of 
the compound, in an appropriate form, to the animal. Administration will be by any route that 
could be utilized for clinical or non-clinical purposes, including but not limited to oral, nasal, 
buccal, or even topical. Alternatively, administration may be by intratracheal instillation, 
bronchial instillation, intradermal, subcutaneous, intramuscular, intraperitoneal or intravenous 
injection. Specifically contemplated routes are systemic intravenous injection, regional 
administration via blood or lymph supply, or directly to an affected site. 

Determining the effectiveness of a compound in vivo may involve a variety of different 
criteria. Also, measuring toxicity and dose response can be performed in animals in a more 
meaningful fashion than in in vitro or in cyto assays. 

4. High Throughput and Flow Cytometry 

High throughput formats are of particular use in drug screening. Flow cytometry 
involves the separation of cells or other particles in a liquid sample based upon signals generated 
in the host cells. Generally, the purpose of flow cytometry is to analyze the separated particles 
for one or more characteristics thereof. The basis steps of flow cytometry involve the direction 
of a fluid sample through an apparatus such that a liquid stream passes through a sensing region. 
The particles should pass one at a time by the sensor and are categorized base on size, refraction, 
light scattering, opacity, roughness, shape, fluorescence, etc. 

Rapid quantitative analysis of cells proves useful in biomedical research and medicine. 
Apparati permit quantitative multiparameter analysis of cellular properties at rates of several 
thousand cells per second. These instruments provide the ability to differentiate among cell 
types. Data are often displayed in one-dimensional (histogram) or two-dimensional (contour 
plot, scatter plot) frequency distributions of measured variables. The partitioning of 
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multiparameter data files involves consecutive use of the interactive one- or two-dimensional 
graphics programs. 

Quantitative analysis of multiparameter flow cytometric data for rapid cell detection 
5 consists of two stages: cell class characterization and sample processing. In general, the process 
of cell class characterization partitions the cell feature into cells of interest and not of interest. 
Then, in sample processing, each cell is classified in one of the two categories according to the 
region in which it falls. Analysis of the class of cells is very important, as high detection 
performance may be expected only if an appropriate characteristic of the cells is obtained. 

10 

Not only is cell analysis performed by flow cytometry, but so too is sorting of cells. In 
U.S. Patent 3,826,364 (incorporated by reference), an apparatus is disclosed which physically 
•J separates particles, such as functionally different cell types, hi this machine, a laser provides 
y illumination which is focused on the stream of particles by a suitable lens or lens system so that 
i|5 there is highly localized scatter from the particles therein. In addition, high intensity source 
yi illumination is directed onto the stream of particles for the excitation of fluorescent particles in 
the stream. Certain particles in the stream may be selectively charged and then separated by 
□ deflecting them into designated receptacles. A classic form of this separation is via fluorescent 
hj tagged antibodies, which are used to mark one or more cell types for separation. 

w 

M- Other methods for flow cytometry can be found in U.S. Patents 4,284,412; 4,989,977; 

4,498,766; 5,478,722; 4,857,451; 4,774,189; 4,767,206; 4,714,682; 5,160,974; and 4,661,913, all 
of which are incorporated by reference. 

25 G. Examples 

The following examples are included to demonstrate preferred embodiments of the 
invention. It should be appreciated by those of skill in the art that the techniques disclosed in the 
examples which follow represent techniques discovered by the inventor to function well in the 
practice of the invention, and thus can be considered to constitute preferred modes for its 
30 practice. However, those of skill in the art should, in light of the present disclosure, appreciate 
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that many changes can be made in the specific embodiments which are disclosed and still obtain 
a like or similar result without departing from the spirit and scope of the invention. 

EXAMPLE 1: MATERIALS AND METHODS 

Antibodies, Chemicals and Expression Vectors 

Monoclonal mouse anti-HA and polyclonal sheep anti-MBP antibodies were purchased 
from BabCO (Richmond, CA). Horseradish peroxidase-conjugated (HRP) secondary antibodies 
were from Jackson ImmunoResearch Laboratories (West Grove, PA). Isopropyl-p-D- 
thiogalactopyranoside (IPTG) and 5-bromo-4-chloro-3-indolyl-p-D-galactopyranoside (X-gal) 
were from Boehringer Mannheim (Indianapolis, IN). O-mtrophenyl-p-D-galactopyranoside 
(ONPG) was purchased from Sigma (St. Louis, MO). The expression vector pMAL-c2x, coding 
for an MBP-a fusion, was from New England Biolabs (Beverly, MA). A plasmid containing 
cDNA for the LivF protein of M. jannaschii (MJ1267) was obtained from the American Type 
Culture Collection. Plasmid pAPP770 containing cDNA for the Alzheimer's precursor protein 
(APP) was the generous gift of Dr. J. Herz, Dept. Molecular Genetics, UT Southwestern, Dallas, 
TX. Plasmid pTRx.parallell containing cDNA for thioredoxin was the generous gift of Dr. K. 
Gardner, Dept. Biochemistry, UT Southwestern, Dallas, TX. Plasmid pGex-2t containing cDNA 
for glutathione S-transferase was from Amersham/Pharmacia (Piscataway, NJ). 

Construction of a-Fusion Expression Vectors 

Complementary DNA fragments coding for residues 404-644 (NBD1-B) and 419-655 
(NBD1-D) of CFTR were excised using Ndel and Xhol from pET28a expression plasmids 
generated as previously described (Qu & Thomas, 1996). Based upon homology to the recently 
published HisP NBD crystal structure (Hung et al, 1998), these constructs are predicted to 
contain the entire first NBD of CFTR. The resulting fragments were ligated into 
Ndel/Sall-digested pMal-c2x in place of the maltose-binding protein (MBP), forming an 
in-frame fusion with the cc-fragment (residues 7-58 from full length p-galactosidase). 
Expression cassette PCR was used to assmble the other oc-fusion constructs examined. The 
MJ1267 cDNA was also subcloned into the Ndel and Sail sites of pMal-c2x. The resulting 
vector contained an in-frame stop codon between MJ1267 and the polylinker of pMAL-c2x 
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which was removed by site-directed mutagenesis completing the a-fusion construct. TRx, GST 
and Ap (APP residues 1-42) were each ligated into Ndel/Sacl-digested pMal-c2x. The cloning 
strategy used to assemble the tandem ApVa-fusion construct, Ap-rpt, was similar to that 
described elsewhere (Culvenor et al, 1998), and utilized an internal EcoRI site to generate an 
exact Ap(l-42) repeat with no intervening sequence. All targets were subcloned in the pMal-c2x 
vector and, therefore, utilize the same promoter. In addition, the ABC transporter NBDs 
evaluated were also expressed in BL21 cells under the control of the T7 promoter of pET28a. In 
each case, fidelity of PCR™ products and constructs was verified by restriction mapping and 
DNA sequencing. 

To serve as a marker for some of the expressed proteins (MJ1267, CFTR-NBD1, TRx, 
GST and Ap), an HA-tag sequence was introduced into the Sail site of the pMal-c2x expression 
vector using two annealed complimentary oligonucleotides coding for the tag sequence and 
flanked by Sail linker sequences. Correct orientation of the resulting ligation products was 
confirmed by DNA sequencing. 

Site-directed mutagenesis 

Oligonucleotide-directed mutagenesis using the QuickChange mutagenesis kit 
(Stratagene, La Jolla, CA) was performed to generate the mutant MBP proteins in the expression 
vector pMal-c2x. The sequences of the antisense mutagenic primers used are as follows: 
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G32D/I33P - 



5'-GATGCTCAACGGTGACTTTAGGATCGGTATCTTCTCGAATTTC-3' 
G32D- 

5'-CAACGGTGACTTTAATATCGGTATCTTTCTCG-3' 

5 133P- 

5 f -GGTGACTTTAGGTCCGGTATCTTTCTCG-3' 

Mutation incorporation was verified by DNA sequencing. Plasmid DNA was purified using 
reagents supplied by Qiagen Inc. 

^ Expression of fusion proteins 

f© Expression constructs were transformed into DH5a E. coli by standard methods and 

] l) colonies selected on LB-agar plates supplemented with 100 |ag/mL ampicillin (amp). From 
J\ single colonies, 10 mL LB + amp cultures were inoculated and allowed to grow overnight at 
37°C. The following day, the overnight culture was diluted 1000-fold into a fresh 10 mL LB 
+amp culture and allowed to grow to mid log phase (ODeoo ~ 0.5). Protein production was 
IS induced by the addition of IPTG to 0.3 mM and the cells were further incubated for the indicated 
! ^ times. 

In vitro assay of fi-gal complementation 

After the completion of fusion protein expression, cells (1.5 mL) were harvested by 
centrifugation at 10,000 x g for two minutes. After removal of the supernatants, the cell pellets 

20 were resuspended in 1 mL of buffer Z (10 mM KC1, 2,0 mM MgS04, 100 mM NaHP04, pH 7.0). 
The cells were pelleted again, resuspended in 0.3 mL buffer Z and lysed by three freeze/thaw 
cycles between liquid nitrogen and a 37°C water bath. Next, 0.1 mL of the resulting cell lysate 
was transferred to a clean microfuge tube to which buffer Z (0.7 ml) supplemented with 0.27% 
P-mercaptoethanol was added. Reactions were initiated by the addition of 160 jaL of ONPG 

25 solution (4.0 mg/mL dissolved in buffer Z) and incubated at 37°C for 10 min. Reactions were 
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quenched by the addition of 0.4 mL 1 M Na 2 CC>3. Tubes were then centrifuged at 10,000 x g for 
10 min to remove debris and the supernatant's absorption at 420 nm was measured. 

Analysis of soluble and insoluble fractions 

To biochemically analyze the solubility characteristics of the expressed fusion proteins, 3 
5 mL of culture from cells induced for the indicated times was harvested by centrifugation, washed 
once and resuspended in 600 jaL lysis solution (100 mM NaCl, 1 mM EDTA, 50 mM Tris Cl ? pH 
7.6). The cell suspensions were lysed by sonication three times for 30 sec at 50°C duty cycle 
and power output of 4 using a Branson model 450 sonifier fit with a microtip probe. All 
manipulations were carried out on ice. After sonication, the solution was centrifuged to separate 
10 soluble and insoluble fractions at 10,000 x g in a micro fuge at 4°C for 10 min. Supernatant and 
pellet fractions were analyzed by SDS PAGE and Western blotting where appropriate. 

5 SDS PAGE and Western blotting 

; .] Expressed proteins were analyzed by electrophoresis through 10% Tricine-SDS 

]fJl polyacrylamide gels using the buffer system of Schagger and von Jagow (1987). Protein bands 
U5 were visualized by staining with coomassie blue. For Western immunoblotting, standard 

methods were employed for transfer of proteins from gels to nitrocellulose. Resulting 
^ membranes were blocked in TBS containing Tween-20 and 10% dehydrated milk for at least 1 hr 
yj and incubated at room temperature with the indicated primary antibodies. Immunoreactive 
^ bands were visualized by ECL (Amersham, Piscataway, NJ) using appropriate HRP-conjugated 
1§ secondary antibodies and X-ray film. The density of bands on coomassie stained gels and 

exposed x-ray film were measured on an Agfa Arcus scanner and quantified using Molecular 

Analyst software (BioRad, Hercules, CA). 

Blue/white screening for j3-gal complementation 

Single colonies of DH5oc containing the individual expression constructs were analyzed 
25 for the ability of the oc-fusion proteins to complement P-gal activity in vivo. Bacteria harboring 
each construct were streaked to single colonies on LB-agar plates supplemented with 100 [ig/mL 
ampicillin, 80 j^g/mL Xgal, and 0.1 mM IPTG. The plates were incubated at 37°C for 18 to 48 
hr and activity of P-gal was assessed by visualization of blue color in a-complementing colonies. 
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Colorimetric screening for P-gal complementation in 96-well plates 

Cells harboring each of the indicated expression constructs were grown to mid log phase 
(ODeoo ~ 0.5) from overnight cultures as described above. 125 pi of each culture ws transferred 
to individual wells of a flat-bottom 96-well plate containing 125 jal LB media supplemented with 
5 100 |ag/mL ampicillin and 0,6 mM IPTG (resulting in a final [IPTG] of 0.3 mM). The plates 
were then placed on an orbit shaker at 37°C with rapid shaking. After induction for 1 hr, X-gal 
was added to a final concentration of 80 |ag/mL, and the plate was returned to the shaker at 37°C 
overnight. 

EXAMPLE 2: RESULTS 

10 In order to test the ability of a-fragment chimeras to complement the G)-fragment of P-gal 

3 and report target protein solubility by producting active p-gal, model polypeptides were fused to 

4 the N-terminus of the a-fragment in an inducible bacterial expression plasmid (FIG. IB). Initial 

: r, experiments focused on the maltose binding protein (MBP) of E. coll MBP is normally secreted 

;•«' into the periplasm of E. coil however, the construct used in the present study lacks the required 

15 leader sequence and therefore, folds in the cytoplasm where the ©-fragment is located. 

C To assess the relative abilities of the expressed a-fusion proteins to complement P-gal 

activity in vivo, E. coli harboring the fusion expression constructs were plated on IPTG/X-gal 

;; 2] indicator plates and the development of blue color in resulting colonies was monitored. pUC19- 
transformed DH5a E. coli, which express a 54 residue a-fragment (residues 6-59 of P-gal), are 

20 the most intensely blue. This represents the level of p-gal complementation attributable to the 
a-fragment alone. The MBP-a fusion protein (MBP residues 1-366, a: residues 7-58 of p-gal) 
also yields significant a-complementation, although less than observed for pUC19, Yanisch- 
Perron etal (1985). 

25 Previously, several mutations were identified which lead to diminished solubility and 

reduced periplasmic yield of MBP (Betton Hofhung, 1986). For example, mutation of two 
residues, 133P and G32D, decreased soluble periplasmic MBP by more than 100-fold. This 
double mutation was introduced into MBP/a fusion construct, and monitored for a- 
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complemenation on indicator plates. The wild-type MPB and the double mutant expressed at 
equivalent levels. Consistent with the previously reported effect of these mutations on the in 
vivo solubility of MBP, the G32D/I33P double mutation significantly impaired the solubility and, 
thus, ability of the fusion protein to complement p-gal activity on indicator plates. 

To test the generality of the assay system, a series of cc-fusion constructs were generated. 
Fusion to a of either TRx or GST (two highly soluble proteins used regularly as fusions to aid in 
the solubility of ill-behaved partners) and express in DH5a on indicator plates results in blue 
color development that is as intense as that observed for the MBP/a fusion construct. Next, a 
series of nucleotide binding domains (NBD) from two ATP binding-cassette (ABC) transporters 
were generated and examined. Two are polypeptides predicted to include the first NBD of the 
cystic fibrosis transmembrane conductance regulator (CFTR): NBD1-B (CFTR residues 404- 
644), and NBD1-D (CFTR residues 419-655). This domain has poor solubility properties due 
either to inherently limited solubility in the absence of other domains of the protein with which it 
normally interacts, or to marginal stability/misfolding or both. Several mutations within this 
domain prevent proper folding of the full length CFTR in vivo and, thus, lead to cystic fibrosis. 
The third NBD, LivF (MJ1267), is a subunit of the branched chain amino acid transporter from 
the hyperthermophilic archaeon M. jannaschi. CFTR NBD1 has been shown to be insoluble, 
forming inclusion bodies when expressed in E. coli (Qu & Thomas, 1996), unless fused to 
soluble protein such as wild-type MBP (Ko et al, 1993) or GST (King & Sorscher, 1998). 
MJ1267, however, has proven much more soluble, yielding 10% soluble protein from a T7 
expression system in BL21 E. coli. 

When expressed in DH5cc on indicator plates, both CFTR NBD/ct fusions result in very 
little blue color, even after 48 hr of growth, although the NBDl-D/oc fusion appears to 
complement measurably more than NBD1-B. By contrast, expression of the MJ1267/ot fusion 
results in a significantly elevated level of blue color when compared to either of the CFTR 
NBD/a fusion proteins. The MBP/a fusion proteins express at higher levels than the NBD/ot 
fusions as a group, and thus more activity. It should be noted that relative levels of 
a-complementation, as evidenced by blue color on indicator plates, can be observed at the single 
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colony level for each of the constructs tested, providing a measure that is independent of plated 
cell density. 

To test whether the a-complementation assay is adaptable to a format amenable to rapid- 
5 throughput screening, the constructs described above were analyzed for the development of blue 
color in a 96-well plate P-gal assay. The levels of blue color obtained in the micro titer plate 
assay for each construct agrees well with that obtained in the agar plate assay. In fact, the 
difference in color levels observed upon comparison of the two CFTR-NBD/a-fusions is more 
apparent in the 96-well plate assay. 

10 

To verify the hypothesis that the intensity of blue color on indicator plates is reporting 
;Sffl . target protein solubility, the amount of soluble versus insoluble protein was measured in 
biochemical fractionation experiments. E. coli expressing wild-type, G32D, 133P, and 
G32D/I33P-MBP/a fusions were subjected to cell disruption and fractionation by centrifugation. 
!ij Analysis by SDS PAGE of the soluble and insoluble fractions for each fusion protein revealed a 
] M correlation between solubility and level of blue color on Xgal plates. It is important to note that 
the aga plate P-gall assay, after long incubation times, is most sensitive to changes from 
p insoluble to higher levels of solubility, the range of greatest practical utility. The wild-type 
g MBP/a fusion fractionates primarily to the supernatant, while the double mutant (G32D/I33P) 
i2p fractionates primarily to the pellet. Fractionation results were further confirmed by Western 
blots probed with anti-MBP antibodies. The fraction of MBP/a fusions that are soluble is in 
agreement with the previously published stability and folding yield of these mutants without the 
a-fragment marker (Betton & Hofnung, 1996). This suggests that the cc-fragment does not 
significantly impact the overall solubility characteristics of the MBP fusion proteins and is 
25 therefore a good reporter of target protein solubility. Similarly, the high levels of blue color 
observed for the GST/a and TRx/cc fusions correlates well with the biochemical fractionation 
experiments, which indicate a majority of both of these proteins partions to the soluble fraction. 

A correlation between the biochemical solubility and a-complementation (as indicated by 
30 blue color of colonies in the plate assays) also was demonstrated for the NBD/a fusion 
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constructs. Both CFTR NBD/a fusion proteins exhibit little to no blue color, and virtually all of 
the fusion protein partitions to the insoluble fraction whether expressed with (DH5a expression) 
or without (BL21 expression) the a-fragment. In contrast, MJ1267, when expressed as an a- 
fragment fusion, produces a significantly higher level of blue color relative to either of the 
CFTR-NBD/oc fusions. This correlates with the partial solubility of MJ1267 either with (DH5a 
expreression) or without (BL21 expression) the a-fragment. Taken together, these results 
suggest that in these cases, the relatively small a-fragment, when fused to a target polypeptide, 
does not have large effects on the target's solubility; neither increasing that of the otherwise 
insoluble targets (CFTR-NBDs), nor decreasing that of the partially soluble one (MJ1267). 

A quantitative measure of a-complementation of p-gal by each of the fusion targets was 
obtained by the direct measurement of activity in cell lysates. A total of four MBP folding 
variants were utilized to establish the quantitative relationship within a target system between p- 
gal activity and biochemical solubility. Table 3 summarizes the results of these in vitro enzyme 
assays. 



TABLE 3 


Target Protein 


P-gal Activity (units/cell) 


MBP wild-type 


102+/- 19 


G32D 


94+/- 21 


I33P 


46 +/- 12 


G32D/I33P 


14 +/- 3 


GST 


134 +/- 8 


TRx 


159 +/- 14 


CFTR NBD1-B 


5+/- 1 


CFTRNBD1-D 


6+1-2 


MJ1267 (LivF) 


12 +/- 6 



A unit of (3-gall activity is defined as the amount of enzyme required to hydrolyze one umole of 
ONPG to o-nitrophenol and D-galactose per minute. Note that the polylinker between MBP (and 
mutants thereof) and the a-fragment is 36 residues in length. This linker was reduced to 9 residues 
during construction of the CFTR-, LivF-, GST-, and TRx-a fusion constructs. 
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Activity correlates well with the relative levels of blue color observed for these constructs. The 
plate assay is less able to distinguish highly soluble targets from those of intermediate solubility 
(MBP single mutants) most likely due to integration of the signal during growth of the colonies. 
FIG. 2 shows a linear relationship between the enzymatic activity (Table 3) and the biochemical 
soluble fraction for each of the MBP/cc fusions as assessed by densitometry of Coomassie- 
stained gels. Again, the activities show a linear correlation with the periplasmic folding yields 
for the unfused MBPs reported by Betton and Hofhung (1996), further supporting the assay's 
ability to report on the intrinsic folding/solubility properties of the target proteins. The differing 
magnitude of the effects reported here when compared with those previously reported by Betton 
and Hofnung (1996) may reflect the cellular environments where folding takes place since the 
present constructs must fold in the cytoplasm. 

In addition to cystic fibrosis, many other human diseases are associated with 
inappropriate folding and/or aggregation of proteins (Thomas et al, 1995; Tan & Pepys, 1994; 
Wells & Warren, 1998). To test whether the structural complementation assay has application to 
such proteins, the Alzheimer's Ap (1-42) peptide, which forms insoluble fibrils in the brains of 
affected individuals, was selected as an additional test case. When fused to the a-fragment and 
expressed in E. coli on indicator plates, the fusion protein is unable to efficiently complement (3- 
gal activity, resulting in very little development of blue color. In contrast, mutation of 
phenylalanine to proline at position 19 of Ap (F19P), a mutation known to retard fibril formation 
in vitro (Wood et al, 1995), results in a clear and measurable increase in blue color on indicator 
plates, approximately a three-fold increase in p-gal activity, and increased fusion protein in the 
soluble fraction at equivalent levels of expression. Recently, Culvenor and co-workers reported 
the production of "large intracellular deposits" of Ap-immunoreactive material upon the 
expression of Ap(l-42) as a tandem head-to-tail duplex in yeast (Culvenor et al, 1998). To 
assess the ability of this assay to report on the solubility state of such a construct, the inventors 
assembled and expressed a tandem repeat of Ap as a fusion with the a-fragment (Ap-rpt). 
Colonies expressing the Ap-rpt/a fusion protein exhibit no detectable blue color on indicator 
plates, in vitro p-gal activity less than that observed for the wild-type Ap/a fusion, and no 
detectable protein is in the soluble fraction. Interestingly, the Ap-rpt protein aggregates to form 
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a ladder of increasingly higher molecular weight insoluble species, a property absent from the 
single Ap/oc fusion and perhaps more reflective of the disease condition. 

5 All of the compositions and/or methods disclosed and claimed herein can be made and 

executed without undue experimentation in light of the present disclosure. While the 
compositions and methods of this invention have been described in terms of preferred 
embodiments, it will be apparent to those of skill in the art that variations may be applied to the 
compositions and/or methods and in the steps or in the sequence of steps of the method described 

10 herein without departing from the concept, spirit and scope of the invention. More specifically, 
it will be apparent that certain agents which are both chemically and physiologically related may 
be substituted for the agents described herein while the same or similar results would be 
achieved. All such similar substitutes and modifications apparent to those skilled in the art are 

\': deemed to be within the spirit, scope and concept of the invention as defined by the appended 

' : $*> claims. 
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